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Preface 


First came the 32-bit SPARC Version 7 (V7) architecture, publicly released in 1987. 
Shortly after, the SPARC V8 architecture was announced and published in book 
form. The 64-bit SPARC V9 architecture was released in 1994. Now, the 
UltraSPARC Architecture specification provides the first significant update in over 
10 years to Sun’s SPARC processor architecture. 


What’s New? 


For the first time, UltraSPARC Architecture 2005 pulls together in one document all 
parts of the architecture: 


m the nonprivilged (Level 1) architecture from SPARC V9 

m most of the privileged (Level 2) architecture from SPARC V9 

m more in-depth coverage of all SPARC V9 features 

Plus, it includes all of Sun’s now-standard architectural extensions (beyond SPARC 


V9), developed through the processor generations of UltraSPARC III, IV, IV+, and 
T1: 


m the VIS!" 1 and VIS 2 instruction set extensions and the associated GSR register 
m multiple levels of global registers, controlled by the GL register 

m Sun's 64-bit MMU architecture 

m privileged instructions ALLCLEAN, OTHERW, NORMALW, and INVALW 

m access to the VER register is now hyperprivileged 


m the SIR instruction is now hyperprivileged 


* Preface i 


In addition, architectural features are now tagged with Software Classes and 
Implementation Classes!. Software Classes provide a new, high-level view of the 
expected architectural longevity and portability of software that references those 
features. Implementation Classes give an indication of how efficiently each feature 
is likely to be implemented across current and future UltraSPARC Architecture 
processor implementations. This information provides guidance that should be 
particularly helpful to programmers who write in assembly language or those who 
write tools that generate SPARC instructions. It also provides the infrastructure for 
defining clear procedures for adding and removing features from the architecture 
over time, with minimal software disruption. 
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CHAPTER 1 


Document Overview 





This chapter discusses: 


m Navigating UltraSPARC Architecture 2005 on page 1. 
m Fonts and Notational Conventions on page 2. 
m Reporting Errors in this Specification on page 5. 





1.1 


Navigating UltraSPARC Architecture 
2005 


If you are new to the SPARC architecture, read Chapter 3, Architecture Overview, 
study the definitions in Chapter 2, Definitions, then look into the subsequent sections 
and appendixes for more details in areas of interest to you. 


If you are familiar with the SPARC V9 architecture but not UltraSPARC Architecture 
2005, note that UltraSPARC Architecture 2005 conforms to the SPARC V9 Level 1 
architecture (and most of Level 2), with numerous extensions — particularly with 
respect toVIS instructions. 

This specfication is structured as follows: 

m Chapter 2, Definitions, which defines key terms used throughout the specification 


m Chapter 3, Architecture Overview, provides an overview of UltraSPARC 
Architecture 2005 


m Chapter 4, Data Formats, describes the supported data formats 
m Chapter 5, Registers, describes the register set 


m Chapter 6, Instruction Set Overview, provides a high-level description of the 
UltraSPARC Architecture 2005 instruction set 


m Chapter 7, Instructions, describes the UltraSPARC Architecture 2005 instruction set 
in great detail 





Chapter 8, IEEE Std 754-1985 Requirements for UltraSPARC Architecture 2005, 
describes the trap model 


Chapter 9, Memory describes the supported memory model 


Chapter 10, Address Space Identifiers (ASIs), provides a complete list of supported 
ASIs 


Chapter 11, Performance Instrumentation describes the architecture for performance 
monitoring hardware 


Chapter 12, Traps, describes the trap model 
Chapter 13, Interrupt Handling, describes how interrupts are handled 
Chapter 14, Memory Management, describes MMU operation 


Appendix A, Opcode Maps, provides the overall picture of how the instruction set 
is mapped into opcodes 


Appendix B, Implementation Dependencies, describes all implementation 
dependencies 


Appendix C, Assembly Language Syntax, describes extensions to the SPARC 
assembly language syntax; in particular, synthetic instructions are documented in 
this appendix 


12 Fonts and Notational Conventions 


Fonts are used as follows: 


Italic font is used for emphasis, book titles, and the first instance of a word that is 
defined. 
Italic font is also used for terms where substitution is expected, for example, 


Ho od 


^£ccn^, "virtual processor n”, or "reg plus imm". 

Italic sans serif font is used for exception and trap names. For example, “The 
privileged action exception...” 

lowercase helvetica font is used for register field names (named bits) and 
instruction field names, for example: "The rs1 field contains...” 
UPPERCASE HELVETICA font is used for register names; for example, FSR. 
TYPEWRITER (Courier) font is used for literal values, such as code (assembly 


language, C language, ASI names) and for state names. For example: $£0, 
ASI PRIMARY, execute state. 

















When a register field is shown along with its containing register name, they are 
separated by a period (’.’), for example, "FSR.cexc". 
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UPPERCASE words are acronyms or instruction names. Some common acronyms 
appear in the glossary in Chapter 2, Definitions. Note: Names of some instructions 
contain both upper- and lower-case letters. 


An underscore character joins words in register, register field, exception, and trap 
names. Note: Such words may be split across lines at the underbar without an 
intervening hyphen. For example: “This is true whenever the integer_condition_ 
code field...” 


The following notational conventions are used: 


The left arrow symbol ( + ) is the assignment operator. For example, 
“PC € PC + 1” means that the Program Counter (PC) is incremented by 1. 


Square brackets ( [ ] ) are used in two different ways, distinguishable by the 
context in which they are used: 


» Square brackets indicate indexing into an array. For example, TT[TL] means the 
element of the Trap Type (TT) array, as indexed by the contents of the Trap 
Level (TL) register. 


= Square brackets are also used to indicate optional additions/extensions to 
symbol names. For example, "ST[D | O]F" expands to all three of “STF”, 
“STDF”, and "STOF". Similarly, A31 PRIMARY[ LITTLE] indicates two 
related address space identifiers, ASI. PRIMARY and ASI PRIMARY LITTLE. 
(Contrast with the use of angle brackets, below) 








Angle brackets ( < > ) indicate mandatory additions/extensions to symbol names. 
For example, “ST<D | Q>F” expands to mean "STDF" and “STQF”. (Contrast with 
the second use of square brackets, above) 


Curly braces ( { ] ) indicate a bit field within a register or instruction. For example, 
CCR{4} refers to bit 4 in the Condition Code Register. 


A consecutive set of values is indicated by specifying the upper and lower limit of 
the set separated by a colon ( : ), for example, CCR(3:0] refers to the set of four 
least significant bits of register CCR. (Contrast with the use of double periods, 
below) 


A double period ( .. ) indicates any single intermediate value between two given 
end values is possible. For example, NAME[2..0] indicates four forms of NAME 
exist: NAME, NAME2, NAMEI, and NAMEO0; whereas NAME<2..0> indicates 
that three forms exist: NAME2, NAME1, and NAMEO. (Contrast with the use of 
the colon, above) 


A vertical bar ( | ) separates mutually exclusive alternatives inside square 
brackets ( [ ] ), angle brackets ( < > ), or curly braces ( ( } ). For example, 
"NAME[A | B]" expands to "NAME, NAMEA, NAMEP" and “NAME<A | B>” 
expands to 'NAMEA, NAMEB". 


The asterisk ( * ) is used as a wild card, encompassing the full set of valid values. 
For example, FCMP* refers to FCMP with all valid suffixes (in this case, 
FCMP<s|d1q> and FCMPE<s |d |q>). An asterisk is typically used when the full 
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list of valid values either is not worth listing (because it has little or no relevance 
in the given context) or the valid values are too numerous to list in the available 
space. 


m The slash ( / ) is used to separate paired or complementary values in a list, for 
example, “the LDBLOCKF/STBLOCKF instruction pair ....” 


m The double colon (::) is an operator that indicates concatenation (typically, of bit 
vectors). Concatenation strictly strings the specified component values into a 
single longer string, in the order specified. The concatenation operator performs 
no arithmetic operation on any of the component values. 


1.2.1 Implementation Dependencies 


Implementors of UltraSPARC Architecture 2005 processors are allowed to resolve 
some aspects of the architecture in machine-dependent ways. 


The definition of each implementation dependency is indicated by the notation 
“IMPL. DEP. #nn-XX: Some descriptive text". The number nn provides an index into 
the complete list of dependencies in Appendix B, Implementation Dependencies. 


A reference to (but not definition of) an implementation dependency is indicated by 
the notation "(impl. dep. nn)". 


1.2.2 Notation for Numbers 


Numbers throughout this specification are decimal (base-10) unless otherwise 
indicated. Numbers in other bases are followed by a numeric subscript indicating 
their base (for example, 1001 , FFFF 000046). Long binary and hexadecimal numbers 
within the text have spaces inserted every four characters to improve readability. 
Within C language or assembly language examples, numbers may be preceded by 
“Ox” to indicate base-16 (hexadecimal) notation (for example, OxFFFF0000). 


1.2.3 Informational Notes 


This guide provides several different types of information in notes, as follows: 


Note | General notes contain incidental information relevant to the 
paragraph preceding the note. 


Programming | Programming notes contain incidental information about how 
Note | software can use an architectural feature. 


Implementation | An Implementation Note contains incidental information, 
Note | describing how an UltraSPARC Architecture 2005 processor 
might implement an architectural feature. 
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V9 Compatibility 
Note 


Forward 
Compatibility 
Note 








Note containing information about possible differences between 
UltraSPARC Architecture 2005 and SPARC V9 implementations. 
Such information is relevant to UltraSPARC Architecture 2005 
implementations and might not apply to other SPARC V9 
implementations. 


Note containing information about how the UltraSPARC 
Architecture is expected to evolve in the future. Such notes are 
not intended as a guarantee that the architecture will evolve as 
indicated, but as a guide to features that should not be depended 
upon to remain the same, by software intended to run on both 
current and future implementations. 


1.3 Reporting Errors in this Specification 


This specification has been reviewed for completeness and accuracy. Nonetheless, as 
with any document this size, errors and omissions may occur, and reports of such 
are welcome. Please send “bug reports” and other comments on this document to 
the email address: UA-editor@sun.com 
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CHAPTER 2 


Definitions 





This chapter defines concepts and terminology common to all implementations of 
UltraSPARC Architecture 2005. 


address space 


address space identifier 
(ASI) 


aliased 


application program 


ASI 
ASR 


big-endian 


BLD 
BST 
byte 
CCR 


clean window 


A range of 2% locations that can be addressed by instruction fetches and load, 
store, or load-store instructions. See also address space identifier (AST). 


An 8-bit value that identifies a particular address space. An ASI is (implicitly 
or explicitly) associated with every instruction access or data access. See also 
implicit ASI. 


Said of each of two virtual or real addresses that refer to the same underlying 
memory location. 


A program executed with the virtual processor in nonprivileged mode. Note: 
Statements made in this specification regarding application programs may not 
be applicable to programs (for example, debuggers) that have access to 
privileged virtual processor state (for example, as stored in a memory-image 
dump). 


Address space identifier. 
Ancillary State register. 


An addressing convention. Within a multiple-byte integer, the byte with the 
smallest address is the most significant; a byte's significance decreases as its 
address increases. 


(Obsolete) abbreviation for Block Load instruction; replaced by LDBLOCKF. 
(Obsolete) abbreviation for Block Store instruction; replaced by STBLOCKF. 
Eight consecutive bits of data, aligned on an 8-bit boundary. 

Abbreviation for Condition Codes Register. 


A register window in which each of the registers contain 0, a valid address 
from the current address space, or valid data from the current address space. 


coherence 


completed (memory 
operation) 


context 


context ID 


copyback 


CPI 


cross-call 
CTI 


current window 


data access 
(instruction) 


DCTI 


denormalized 
number 


deprecated 


doubleword 


even parity 


A set of protocols guaranteeing that all memory accesses are globally visible to 
all caches on a shared-memory bus. 


Said of a memory transaction when an idealized memory has executed the 
transaction with respect to all processors. A load is considered completed 
when no subsequent memory transaction can affect the value returned by the 
load. A store is considered completed when no subsequent load can return the 
value that was overwritten by the store. 


A set of translations that defines a particular address space. See also Memory 
Management Unit (MMU). 


A numeric value that uniquely identifies a particular context. 


The process of sending a copy of the data from a cache line owned by a 
physical processor core, in response to a snoop request from another device. 


Cycles per instruction. The number of clock cycles it takes to execute an 
instruction. 


An interprocessor call in a system containting multiple virtual processors. 
Abbreviation for control-transfer instruction. 


The block of 24 R registers that is presently in use. The Current Window 
Pointer (CWP) register points to the current window. 


A load, store, load-store, or FLUSH instruction. 


Delayed control transfer instruction. 


Synonym for subnormal number. 


The term applied to an architectural feature (such as an instruction or register) 
for which an UltraSPARC Architecture implementation provides support only 
for compatibility with previous versions of the architecture. Use of a 
deprecated feature must generate correct results but may compromise software 
performance. 


Deprecated features should not be used in new UltraSPARC Architecture 
software and may not be supported in future versions of the architecture. 


An 8-byte datum. Note: The definition of this term is architecture dependent 
and may differ from that used in other processor architectures. 


The mode of parity checking in which each combination of data bits plus a 
parity bit contains an even number of ‘1’ bits. 
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exception 


explicit ASI 


extended word 


fccn 
FGU 


floating-point 
exception 


F register 


floating-point operate 
instructions 


floating-point trap 
type 


floating-point unit 


FPop 
FPRS 
FPU 

FSR 

GL 

GSR 
halfword 


A condition that makes it impossible for the processor to continue executing 
the current instruction stream. Some exceptions may be masked (that is, trap 
generation disabled — for example, floating-point exceptions masked by 
FSR.tem) so that the decision on whether or not to apply special processing 
can be deferred and made by software at a later time. See also trap. 


An ASI that that is provided by a load, store, or load-store alternate instruction 
(either from its imm_asi field or from the ASI register). 


An 8-byte datum, nominally containing integer data. Note: The definition of 
this term is architecture dependent and may differ from that used in other 
processor architectures. 


One of the floating-point condition code fields fccO, fcc1, fcc2, or fcc3. 


Floating-point and Graphics Unit (which most implementations specify as a 
superset of FPU). 


An exception that occurs during the execution of a floating-point operate 
(FPop) instruction. The exceptions are unfinished_FPop, unimplemented_FPop, 
sequence_error, hardware_error, invalid_fp_register, or IEEE_754_exception. 


A floating-point register. The SPARC V9 architecture includes single-, double-, 
and quad-precision F registers. 


Instructions that perform floating-point calculations, as defined in Floating- 
Point Operate (FPop) Instructions on page 119. FPop instructions do not include 
FBfcc instructions, loads and stores between memory and the F registers, or 
non-floating-point operations that read or write F registers. 


The specific type of a floating-point exception, encoded in the FSR ftt field. 


A processing unit that contains the floating-point registers and performs 
floating-point operations, as defined by this specification. 


Abbreviation for floating-point operate (instructions). 
Floating-Point Register State register. 

Floating-Point Unit. 

Floating-Point Status register. 

Global Level register. 

General Status register. 


A 2-byte datum. Note: The definition of this term is architecture dependent 
and may differ from that used in other processor architectures. 
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hyperprivileged 


TEEE 754 


IEEE-754 exception 


implementation 


implementation 
dependent 


implicit ASI 


initiated 
instruction field 


instruction group 


instruction set 
architecture 


integer unit 


interrupt request 
inter-strand 
intra-strand 
invalid 

(ASI or address) 
ISA 


An adjective that describes: 

(1) the state of the processor when theprocessor is in hyperprivileged mode; 

(2) processor state that is only accessible to software while the processor is in 
hyperprivileged mode 


IEEE Standard 754-1985, the IEEE Standard for Binary Floating-Point 
Arithmetic. 


A floating-point exception, as specified by IEEE Std 754-1985. Listed within 
this specification as IEEE 754 exception. 


Hardware or software that conforms to all of the specifications of an 
instruction set architecture (ISA). 


An aspect of the UltraSPARC Architecture that can legitimately vary among 
implementations. In many cases, the permitted range of variation is specified. 
When a range is specified, compliant implementations must not deviate from 
that range. 


An address space identifier that is implicitly supplied by the virtual processor 
on all instruction accesses and on data accesses that do not explicitly provide 
an ASI value (from either an imm_asi instruction field or the ASI register). 


Synonym for issued. 
A bit field within an instruction word. 


One or more independent instructions that can be dispatched for simultaneous 
execution. 


A set that defines instructions, registers, instruction and data memory, the 
effect of executed instructions on the registers and memory, and an algorithm 
for controlling instruction execution. Does not define clock cycle times, cycles 
per instruction, data paths, etc. This specification defines the UltraSPARC 
Architecture 2005 instruction set architecture. 


A processing unit that performs integer and control-flow operations and 
contains general-purpose integer registers and virtual processor state registers, 
as defined by this specification. 


A request for service presented to a virtual processor by an external device. 
Describes an operation that crosses virtual processor (strand) boundaries. 


Describes an operation that occurs entirely within one virtual processor 
(strand). 


Undefined, reserved, or illegal. 


Instruction set architecture. 
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issued 


IU 


little-endian 


load 


load-store 


may 


Memory Management 
Unit 


MMU 


multiprocessor 
system 


must 


next program counter 


NFO 


nonfaulting load 


A memory transaction (load, store, or atomic load-store) is said to be “issued” 
when a virtual processor has sent the transaction to the memory subsystem 
and the completion of the request is out of the virtual processor’s control. 
Synonym for initiated. 


Integer Unit. 


An addressing convention. Within a multiple-byte integer, the byte with the 
smallest address is the least significant; a byte’s significance increases as its 
address increases. 


An instruction that reads (but does not write) memory or reads (but does not 
write) location(s) in an alternate address space. Some examples of Load 
includes loads into integer or floating-point registers, block loads, and 
alternate address space variants of those instructions. See also load-store and 
store, the definitions of which are mutually exclusive with load. 


An instruction that explicitly both reads and writes memory or explicitly reads 
and writes location(s) in an alternate address space. Load-store includes 
instructions such as CASA, CASXA, LDSTUB, and the deprecated SWAP 
instruction. See also load and store, the definitions of which are mutually 
exclusive with load-store. 


A keyword indicating flexibility of choice with no implied preference. Note: 
“may” indicates that an action or operation is allowed; “can” indicates that it is 
possible. 


The address translation hardware in an UltraSPARC Architecture 
implementation that translates 64-bit virtual address into underlying hardware 
addresses. The MMU is composed of the ASRs and ASI registers used to 
manage address translation. See also context real address, and virtual 
address. 


Abbreviation for Memory Management Unit. 


A system containing more than one processor. 


A keyword indicating a mandatory requirement. Designers must implement 
all such mandatory requirements to ensure interoperability with other 
UltraSPARC Architecture-compliant products. Synonym for shall. 


Conceptually, a register that contains the address of the instruction to be 
executed next if a trap does not occur. 


Nonfault access only. 


A load operation that behaves identically to a normal load operation, except 
when supplied an invalid effective address by software. In that case, a regular 
load triggers an exception whereas a nonfaulting load appears to ignore the 
exception and loads its destination register with a value of zero (on an 
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nonprivileged 


nonprivileged mode 


nontranslating ASI 


NPC 

npt 

nucleus software 
NUMA 
N_REG_WINDOWS 


octlet 


odd parity 


opcode 
optional 
PC 

PCR 


physical processor 


PIC 
PIL 


UltraSPARC Architecture processor, hardware treats regular and nonfaulting 
loads identically; the distinction is made in trap handler software). Contrast 
with speculative load. 


An adjective that describes 

(1) the state of the virtual processor when PSTATE.priv = 0, that is, when 

it is in nonprivileged mode; 

(2) virtual processor state information that is accessible to software regardless 
of the current privilege mode; for example, nonprivileged registers, 
nonprivileged ASRs, or, in general, nonprivileged state; 

(3) an instruction that can be executed in any privilege mode (privileged 
or nonprivileged). 


The mode in which a virtual processor is operating when executing application 
software (at the lowest privilege level). Nonprivileged mode is defined by 
PSTATE.priv = 0. See also privileged and hyperprivileged. 


An ASI that does not refer to memory (for example, refers to control/status 
register(s)) and for which the MMU does not perform address translation. 


Next program counter. 

Nonprivileged trap. 

Privileged software running at a trap level greater than 0 (TL> 0). 
Nonuniform memory access. 

The number of register windows present in a particular implementation. 


Eight bytes (64 bits) of data. Not to be confused with “octet,” which has been 
commonly used to describe eight bits of data. In this document, the term byte, 
rather than octet, is used to describe eight bits of data. 


The mode of parity checking in which each combination of data bits plus a 
parity bit together contain an odd number of ‘1’ bits. 


A bit pattern that identifies a particular instruction. 

A feature not required for UltraSPARC Architecture 2005 compliance. 
Program counter. 

Performance Control register. 


Synonym for processor; used when an explicit contrast needs to be drawn 
between processor and virtual processor. See also processor and virtual 
processor. 


Performance Instrumentation Counter. 


Processor Interrupt Level register. 
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pipeline 


prefetchable 


privileged 


privileged mode 


processor 


processor core 
processor module 
program counter 


quadword 


R register 
RA 

RAS 
RAW 

rd 


Refers to an execution pipeline, the basic collection of hardware needed to 
execute instructions. See also processor, strand, thread, and virtual processor. 


(1) An attribute of a memory location that indicates to an MMU that 
PREFETCH operations to that location may be applied. 

(2) A memory location condition for which the system designer has 
determined that no undesirable effects will occur if a PREFETCH operation to 
that location is allowed to succeed. Typically, normal memory is prefetchable. 


Nonprefetchable locations include those that, when read, change state or cause 
external events to occur. For example, some I/O devices are designed with 
registers that clear on read; others have registers that initiate operations when 
read. See also side effect. 


An adjective that describes: 

(1) the state of the virtual processor when PSTATE.priv = 1, 
that is, when the virtual processor is in privileged mode; 

(2) processor state that is only accessible to software while the virtual processor 
is in privileged mode; for example, privileged registers,privileged ASRs, 
or, in general, privileged state; 

(3) an instruction that can be executed only when the virtual processor is in 
privileged mode. 


The mode in which a processor is operating when PSTATE.priv = 1. See also 
nonprivileged and hyperprivileged. 


The unit on which a shared interface is provided to control the configuration 
and execution of a collection of strands; a physical module that plugs into a 
system. Synonym for processor module. See also pipeline, strand, thread, and 
virtual processor. 


Synonym for physical core. 
Synonym for processor. 
A register that contains the address of the instruction currently being executed. 


A 16-byte datum. Note: The definition of this term is architecture dependent 
and may be different from that used in other processor architectures. 


An integer register. Also called a general-purpose register or working register. 
Real address. 

Reliability, Availability, and Serviceability 

Read After Write (hazard) 


Rounding direction. 
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real address An address produced by a virtual processor that refers to a particular software- 
visible memory location, as viewed from privileged mode. Virtual addresses 
are usually translated by a combination of hardware and software to real 
addresses, which can be used to access real memory. See also virtual address. 


reserved Describing an instruction field, certain bit combinations within an instruction 
field, or a register field that is reserved for definition by future versions of the 
architecture. 


A reserved instruction field must read as 0, unless the implementation supports 
extended instructions within the field. The behavior of an UltraSPARC 
Architecture 2005 virtual processor when it encounters a nonzero value in a 
reserved instruction field is as defined in Reserved Opcodes and Instruction Fields 
on page 120. 


A reserved bit combination within an instruction field is defined in Chapter 7, 
Instructions. In all cases, an UltraSPARC Architecture 2005 processor must 
decode and trap on such reserved bit combinations. 


A reserved field within a register reads as 0 in current implementations and, when 
written by software, should always be written with values of that field 
previously read from that register or with the value zero (as described in 
Reserved Register Fields on page 46). 


Throughout this specification, figures and tables illustrating registers and 
instruction encodings indicate reserved fields and reserved bit combinations 
with a wide ("em") dash (—). 


restricted Describes an address space identifier (ASI) that may be accessed only while the 
virtual processor is operating in privileged mode. 


retired An instruction is said to be “retired” when one of the following two events has 
occurred: 
(1) A precise trap has been taken, with TPC containing the instruction's 
address (the instruction has not changed architectural state in this case). 
(2) The instruction's execution has progressed to a point at which architectural 
state affected by the instruction has been updated such that all three of the 
following are true: 


a The PC has advanced beyond the instruction. 

= Except for deferred trap handlers, no consumer in the same instruction 
stream can see the old values and all consumers in the same instruction 
stream will see the new values. 

wm Stores are visible to all loads in the same instruction stream, including stores 
to noncacheable locations. 


RMO Abbreviation for Relaxed Memory Order (a memory model). 
RTO Read to Own (a type of transaction, used to request ownership of a cache line). 


RTS Read to Share (a type of transaction, used to request read-only access to a 
cache line). 


shall Synonym for must. 
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should 


side effect 


SIMD 


speculative load 


store 


strand 


subnormal number 


superscalar 


supervisor software 


synchronization 


system 


taken 


A keyword indicating flexibility of choice with a strongly preferred 
implementation. Synonym for it is recommended. 


The result of a memory location having additional actions beyond the reading 
or writing of data. A side effect can occur when a memory operation on that 
location is allowed to succeed. Locations with side effects include those that, 
when accessed, change state or cause external events to occur. For example, 
some I/O devices contain registers that clear on read; others have registers that 
initiate operations when read. See also prefetchable. 


Single Instruction/Multiple Data; a class of instructions that perform identical 
operations on multiple data contained (or “packed”) in each source operand. 


A load operation that is issued by a virtual processor speculatively, that is, 
before it is known whether the load will be executed in the flow of the 
program. Speculative accesses are used by hardware to speed program 
execution and are transparent to code. An implementation, through a 
combination of hardware and system software, must nullify speculative loads 
on memory locations that have side effects; otherwise, such accesses produce 
unpredictable results. Contrast with nonfaulting load. 


An instruction that writes (but does not explicitly read) memory or writes (but 
does not explicitly read) location(s) in an alternate address space. Some 
examples of Store includes stores from either integer or floating-point registers, 
block stores, Partial Store, and alternate address space variants of those 
instructions. See also load and load-store, the definitions of which are 
mutually exclusive with store. 


The hardware state that must be maintained in order to execute a software 
thread. See also pipeline, processor, thread, and virtual processor. 


A nonzero floating-point number, the exponent of which has a value of zero. A 
more complete definition is provided in IEEE Standard 754-1985. 


An implementation that allows several instructions to be issued, executed, and 
committed in one clock cycle. 


Software that executes when the virtual processor is in privileged mode. 


An operation that causes the processor to wait until the effects of all previous 
instructions are completely visible before any subsequent instructions are 
executed. 


A set of virtual processors that share a common hardware memory address 
space. 


A control-transfer instruction (CTI) is taken when the CTI writes the target 
address value into NPC. 


A trap is taken when the control flow changes in response to an exception, 
reset, Tcc instruction, or interrupt. An exception must be detected and 
recognized before it can cause a trap to be taken. 
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TBA 


thread 


TNPC 
TPC 


trap 


TSB 


TSO 
TTE 


UA-2005 


unassigned 


undefined 


unimplemented 


unpredictable 
uniprocessor system 


unrestricted 


Trap base address. 


A software entity that can be executed on hardware. See also pipeline, 
processor, strand, and virtual processor. 


Trap-saved next program counter. 
Trap-saved program counter. 


The action taken by a virtual processor when it changes the instruction flow in 
response to the presence of an exception, reset, a Tcc instruction, or an 
interrupt. The action is a vectored transfer of control to more-privileged 
software through a table, the address of which is specified by the privileged 
Trap Base Address (TBA) register. See also exception. 


Translation storage buffer. A table of the address translations that is 
maintained by software in system memory and that serves as a cache of 
virtual-to-real address mappings. 


Total Store Order (a memory model). 


Translation Table Entry. Describes the virtual-to-real translation and page 
attributes for a specific page in the page table. In some cases, this term is 
explicitly used to refer to entries in the TSB. 


UItraSPARC Architecture 2005 


A value (for example, an ASI number), the semantics of which are not 
architecturally mandated and which may be determined independently by 
each implementation within any guidelines given. 


An aspect of the architecture that has deliberately been left unspecified. 
Software should have no expectation of, nor make any assumptions about, an 
undefined feature or behavior. Use of such a feature can deliver unexpected 
results and may or may not cause a trap. An undefined feature may vary 
among implementations, and may also vary over time on a given 
implementation. 


Notwithstanding any of the above, undefined aspects of the architecture shall 
not cause security holes (such as changing the privilege state or allowing 
circumvention of normal restrictions imposed by the privilege state), put a 
virtual processor into a more-privileged mode, or put the virtual processor into 
an unrecoverable state. 


An architectural feature that is not directly executed in hardware because it is 
optional or is emulated in software. 


Synonym for undefined. 
A system containing a single virtual processor. 


Describes an address space identifier (ASI) that can be used in all privileged 
modes; that is, regardless of the value of PSTATE.priv. 
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user application 
program 


VA 


virtual address 


virtual core, 
virtual processor core 


virtual processor 


VIS 
VP 


word 


Synonym for application program. 
Abbreviation for virtual address. 


An address produced by a virtual processor that refers to a particular software- 
visible memory location. Virtual addresses usually are translated by a 
combination of hardware and software to real addresses, which can be used to 
access real memory. See also real address. 


Synonyms for virtual processor. 


The term virtual processor, or virtual processor core, is used to identify each 
strand in a processor. At any given time, an operating system can have a 
different thread scheduled on each virtual processor. See also pipeline, 
processor, strand, and thread. 


Abbreviation for VIS™ Instruction Set. 
Abbreviation for virtual processor. 


A 4-byte datum. Note: The definition of this term is architecture dependent 
and may differ from that used in other processor architectures. 
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CHAPTER 3 


Architecture Overview 





The UltraSPARC Architecture supports 32-bit and 64-bit integer and 32-bit, 64-bit, 
and 128-bit floating-point as its principal data types. The 32-bit and 64-bit floating- 
point types conform to IEEE Std 754-1985. The 128-bit floating-point type conforms 
to IEEE Std 1596.5-1992. The architecture defines general-purpose integer, floating- 
point, and special state/status register instructions, all encoded in 32-bit-wide 
instruction formats. The load/store instructions address a linear, 29^-byte virtual 
address space. 


The UltraSPARC Architecture 2005 specification describes a processor architecture to 
which Sun Microsystem's SPARC processor implementations (beginning with 
UItraSPARC T1) comply. Future implementations are expected to comply with either 
this document or a later revision of this document. 


The UltraSPARC Architecture 2005 is a descendant of the SPARC V9 architecture and 
complies fully with the "Level 1" (nonprivileged) SPARC V9 specification. 


Nonprivileged (application) software that is intended to be portable across all 
SPARC V9 processors should be written to adhere to The SPARC Architecture Manual- 
Version 9. 


Material in this document specific to UltraSPARC Architecture 2005 processors may 
not apply to SPARC V9 processors produced by other vendors. 


In this specification, the word architecture refers to the processor features that are 
visible to an assembly language programmer or to a compiler code generator. It does 
not include details of the implementation that are not visible or easily observable by 
software, nor those that only affect timing (performance). 
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3.1 The UltraSPARC Architecture 2005 


This section briefly describes features, attributes, and components of the 
UltraSPARC Architecture 2005 and, further, describes correct implementation of the 
architecture specification and SPARC V9-compliance levels. 


3.1.1 Features 


The UltraSPARC Architecture 2005, like its ancestor SPARC V9, includes the 
following principal features: 


A linear 64-bit address space with 64-bit addressing. 


32-bit wide instructions — These are aligned on 32-bit boundaries in memory. 
Only load and store instructions access memory and perform I/O. 


Few addressing modes — A memory address is given as either “register + 
register” or “register + immediate”. 


Triadic register addresses — Most computational instructions operate on two 
register operands or one register and a constant and place the result in a third 
register. 


A large windowed register file — At any one instant, a program sees 8 global 
integer registers plus a 24-register window of a larger register file. The windowed 
registers can be used as a cache of procedure arguments, local values, and return 
addresses. 


Floating point — The architecture provides an IEEE 754-compatible floating- 
point instruction set, operating on a separate register file that provides 32 single- 
precision (32-bit), 32 double-precision (64-bit), and 16 quad-precision (128-bit) 
overlayed registers. 


Fast trap handlers — Traps are vectored through a table. 


Multiprocessor synchronization instructions — Multiple variations of atomic 
load-store memory operations are supported. 


Predicted branches — The branch with prediction instructions allows the 
compiler or assembly language programmer to give the hardware a hint about 
whether a branch will be taken. 


Branch elimination instructions — Several instructions can be used to eliminate 
branches altogether (for example, Move on Condition). Eliminating branches 
increases performance in superscalar and superpipelined implementations. 


Hardware trap stack — A hardware trap stack is provided to allow nested traps. 
It contains all of the machine state necessary to return to the previous trap level. 
The trap stack makes the handling of faults and error conditions simpler, faster, 
and safer. 
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In addition, UltraSPARC Architecture 2005 includes the following features that were 
not present in the SPARC V9 specification: 


m Hyperprivileged mode, which simplifies porting of operating systems, supports 
far greater portability of operating system (privileged) software, and supports the 
ability to run multiple simultaneous guest operating systems. (hyperprivileged 
mode is described in detail in the Hyperprivileged version of this specification) 


m Multiple levels of global registers — Instead of the two 8-register sets of global 
registers specified in the SPARC V9 architecture, UltraSPARC Architecture 2005 
provides multiple sets; typically, one set is used at each trap level. 


m Extended instruction set — UltraSPARC Architecture 2005 provides many 
instruction set extensions, including the VIS instruction set for "vector" (SIMD) 
data operations. 


m More detailed, specific instruction descriptions — UltraSPARC Architecture 
2005 provides many more details regarding what exceptions can be generated by 
each instruction and the specific conditions under which those exceptions can 
occur. Also, detailed lists of valid ASIs are provided for each load/store 
instruction from/to alternate space. 


m Detailed MMU architecture — UltraSPARC Architecture 2005 provides a 
blueprint for the software view of the UltraSPARC MMU (TTEs and TSBs). 


Attributes 


UltraSPARC Architecture 2005 is a processor instruction set architecture (ISA) derived 
from SPARC V8 and SPARC V9, which in turn come from a reduced instruction set 
computer (RISC) lineage. As an architecture, UltraSPARC Architecture 2005 allows 
for a spectrum of processor and system implementations at a variety of price/ 
performance points for a range of applications, including scientific/engineering, 
programming, real-time, and commercial applications. 


3.1.2.1 Design Goals 


The UltraSPARC Architecture 2005 architecture is designed to be a target for 
optimizing compilers and high-performance hardware implementations. This 
specification documents the UltraSPARC Architecture 2005 and provides a design 
spec against which an implementation can be verified, using appropriate verification 
software. 


3.1.2.2 Register Windows 


The UltraSPARC Architecture 2005 architecture is derived from the SPARC 
architecture, which was formulated at Sun Microsystems in 1984 through1987. The 
SPARC architecture is, in turn, based on the RISC I and II designs engineered at the 
University of California at Berkeley from 1980 through 1982. The SPARC “register 
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window” architecture, pioneered in the UC Berkeley designs, allows for 
straightforward, high-performance compilers and a reduction in memory load/store 
instructions. 


Note that privileged software, not user programs, manages the register windows. 
Privileged software can save a minimum number of registers (approximately 24) 
during a context switch, thereby optimizing context-switch latency. 


System Components 


The UltraSPARC Architecture 2005 allows for a spectrum of subarchitectures, such 
as cache system. 


3.1.3.1 Binary Compatibility 


The most important mandate for the UltraSPARC Architecture is compatibility 
across implementations of the architecture for application (nonprivileged) software, 
down to the binary level. Binaries executed in nonprivileged mode should behave 
identically on all UltraSPARC Architecture systems when those systems are running 
an operating system known to provide a standard execution environment. One 
example of such a standard environment is the SPARC V9 Application Binary 
Interface (ABI). 


Although different UltraSPARC Architecture 2005 systems can execute 
nonprivileged programs at different rates, they will generate the same results as long 
as they are run under the same memory model. See Chapter 9, Memory, for more 
information. 


Additionally, UltraSPARC Architecture 2005 is binary upward-compatible from 
SPARC V9 for applications running in nonprivileged mode that conform to the 
SPARC V9 ABI and upward-compatible from SPARC V8 for applications running in 
nonprivileged mode that conform to the SPARC V8 ABI. 


3.1.3.2 UltraSPARC Architecture 2005 MMU 


Although the SPARC V9 architecture allows its implementations freedom in their 
MMU designs, UltraSPARC Architecture 2005 defines a common MMU architecture 
(see Chapter 14, Memory Management) with some specifics left to implementations 
(see processor implementation documents). 


3.1.8.3 Privileged Software 


UltraSPARC Architecture 2005 does not assume that all implementations must 
execute identical privileged software (operating systems). Thus, certain traits that 
are visible to privileged software may be tailored to the requirements of the system. 
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3.1.4 


3.1.5 


3.1.6 


Architectural Definition 


The UltraSPARC Architecture 2005 is defined by the chapters and appendixes of this 
specification. A correct implementation of the architecture interprets a program 
strictly according to the rules and algorithms specified in the chapters and 
appendixes. 


UltraSPARC Architecture 2005 defines a set of implementations that conform to the 
SPARC V9 architecture, Level 1. 


UltraSPARC Architecture 2005 Compliance with 
SPARC V9 Architecture 


UltraSPARC Architecture 2005 fully complies with SPARC V9 Level 1 
(nonprivileged). It partially complies with SPARC V9 Level 2 (privileged). 


Implementation Compliance with UltraSPARC 
Architecture 2005 


Compliant implementations must not add to or deviate from this standard except in 
aspects described as implementation dependent. Appendix B, Implementation 
Dependencies lists all UltraSPARC Architecture 2005, SPARC V9, and SPARC V8 
implementation dependencies. Documents for specific UltraSPARC Architecture 
2005 processor implementations describe the manner in which implementation 
dependencies have been resolved in those implementations. 


IMPL. DEP. #1-V8: Whether an instruction complies with UltraSPARC Architecture 
2005 by being implemented directly by hardware, simulated by software, or 
emulated by firmware is implementation dependent. 





DA 


Processor Architecture 


An UItraSPARC Architecture processor logically consists of an integer unit (IU) and 
a floating-point unit (FPU), each with its own registers. This organization allows for 
implementations with concurrent integer and floating-point instruction execution. 
Integer registers are 64 bits wide; floating-point registers are 32, 64, or 128 bits wide. 
Instruction operands are single registers, register pairs, register quadruples, or 
immediate constants. 
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312.2 


An UltraSPARC Architecture virtual processor can run in nonprivileged mode, 
privileged mode, or in mode(s) of greater privilege. In privileged mode, the processor 
can execute nonprivileged and privileged instructions. In nonprivileged mode, the 
processor can only execute nonprivileged instructions. In nonprivileged or 
privileged mode, an attempt to execute an instruction requiring greater privilege 
than the current mode causes a trap. 


Integer Unit (IU) 


An UItraSPARC Architecture 2005 implementation's integer unit contains the 
general-purpose registers and controls the overall operation of the virtual processor. 
The IU executes the integer arithmetic instructions and computes memory addresses 
for loads and stores. It also maintains the program counters and controls instruction 
execution for the FPU. 


IMPL. DEP. #2-V8: An UltraSPARC Architecture implementation may contain from 
72 to 640 general-purpose 64-bit R registers. This corresponds to a grouping of the 
registers into MAXPGL + 1 sets of global R registers plus a circular stack of 
N_REG_WINDOWS sets of 16 registers each, known as register windows. The number 
of register windows present (N_REG_WINDOWS) is implementation dependent, within 
the range of 3 to 32 (inclusive). 


Floating-Point Unit (FPU) 


An UltraSPARC Architecture 2005 implementation’s FPU has thirty-two 32-bit 
(single-precision) floating-point registers, thirty-two 64-bit (double-precision) 
floating-point registers, and sixteen 128-bit (quad-precision) floating-point registers, 
some of which overlap. 


If no FPU is present, then it appears to software as if the FPU is permanently 
disabled. 


If the FPU is not enabled, then an attempt to execute a floating-point instruction 
generates an fp disabled trap and the fp disabled trap handler software must either 


m Enable the FPU (if present) and reexecute the trapping instruction, or 
m Emulate the trapping instruction in software. 





3.3 


Instructions 


Instructions fall into the following basic categories: 


m Memory access 
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Integer arithmetic / logical / shift 

Control transfer 

State register access 

Floating-point operate 

Conditional move 

Register window management 

SIMD (single instruction, multiple data) instructions 


These classes are discussed in the following subsections. 


Memory Access 


Load, store, load-store, and PREFETCH instructions are the only instructions that 
access memory. They use two R registers or an R register and a signed 13-bit 
immediate value to calculate a 64-bit, byte-aligned memory address. The Integer 
Unit appends an ASI to this address. 


The destination field of the load/store instruction specifies either one or two R 
registers or one, two, or four F registers that supply the data for a store or that 
receive the data from a load. 


Integer load and store instructions support byte, halfword (16-bit), word (32-bit), 
and extended-word (64-bit) accesses. There are versions of integer load instructions 
that perform either sign-extension or zero-extension on 8-bit, 16-bit, and 32-bit 
values as they are loaded into a 64-bit destination register. Floating-point load and 
store instructions support word, doubleword, and quadword! memory accesses. 


CASA, CASXA, and LDSTUB are special atomic memory access instructions that 
concurrent processes use for synchronization and memory updates. 


Note | The SWAP instruction is also specified, but it is deprecated and 
should not be used in newly developed software. 


The (nonportable) LDTXA instruction supplies an atomic 128-bit (16-byte) load that 
is important in certain system software applications. 


3.3.1.1 Memory Alignment Restrictions 


A memory access on an UltraSPARC Architecture virtual processor must typically be 
aligned on an address boundary greater than or equal to the size of the datum being 
accessed. An improperly aligned address in a load, store, or load-store in instruction 
may trigger an exception and cause a subsequent trap. For details, see Memory 
Alignment Restrictions on page 102. 


1- No UltraSPARC Architecture processor currently implements the LDQF instruction in hardware; it generates 
an exception and is emulated in software running at a higher privilege level. 
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3.3.1.2 Addressing Conventions 


The UltraSPARC Architecture uses big-endian byte order by default: the address of a 
quadword, doubleword, word, or halfword is the address of its most significant 
byte. Increasing the address means decreasing the significance of the unit being 
accessed. All instruction accesses are performed using big-endian byte order. 


The UltraSPARC Architecture also supports little-endian byte order for data accesses 
only: the address of a quadword, doubleword, word, or halfword is the address of 
its least significant byte. Increasing the address means increasing the significance of 
the data unit being accessed. 


Addressing conventions are illustrated in FIGURE 6-2 on page 105 and FIGURE 6-3 on 
page 107. 


3.8.1.3 Addressing Range 


IMPL. DEP. #405-S10: An UltraSPARC Architecture implementation may support a 
full 64-bit virtual address space or a more limited range of virtual addresses. In an 
implementation that does not support a full 64-bit virtual address space, the 
supported range of virtual addresses is restricted to two equal-sized ranges at the 
extreme upper and lower ends of 64-bit addresses; that is, for n-bit virtual addresses, 
the valid address ranges are 0 to 2^1 — 1 and 2° — 2"! to 264 — 1, 


3.3.1.4  Load/Store Alternate 


Versions of load/store instructions, the load/store alternate instructions, can specify an 
arbitrary 8-bit address space identifier for the load/store data access. 

Access to alternate spaces 0016-2F16 is restricted to privileged software, access to 
alternate spaces 3016-7F46 is restricted to hyperprivileged software, and access to 
alternate spaces 80)¢—FFy¢ is unrestricted. Some of the ASIs are available for 
implementation-dependent uses. Privileged software can use the implementation- 
dependent ASIs to access special protected registers, such as cache control registers, 
virtual processor state registers, and other processor-dependent or system- 
dependent values. See Address Space Identifiers (ASIs) on page 108 for more 
information. 


Alternate space addressing is also provided for the atomic memory access 
instructions LDSTUBA, CASA, and CASXA. 


Note | The SWAPA instruction is also specified, but it is deprecated and 
should not be used in newly developed software. 
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3.3.15 Separate Instruction and Data Memories 


The interpretation of addresses can be unified, in which case the same translations 
and caching are applied to both instructions and data. Alternatively, addresses can 
be "split", in which case instruction references use one caching and translation 
mechanism and data references use another, although the same underlying main 
memory is shared. 


In such split-memory systems, the coherency mechanism may be split, so a write! 
into data memory is not immediately reflected in instruction memory. For this 
reason, programs that modify their own instruction stream (self-modifying code?) 
and that wish to be portable across all UltraSPARC Architecture (and SPARC V9) 
processors must issue FLUSH instructions, or a system call with a similar effect, to 
bring the instruction and data caches into a consistent state. 


An UltraSPARC Architecture virtual processor may or may not have coherent 
instruction and data caches. Even if an implementation does have coherent 
instruction and data caches, a FLUSH instruction is required for self-modifying code 
— not for cache coherency, but to flush pipeline instruction buffers that contain 
unmodified instructions which may have been subsequently modified. 


3.3.1.6 Input/Output (I/O) 


The UltraSPARC Architecture assumes that input/output registers are accessed 
through load/store alternate instructions, normal load/store instructions, or read / 
write Ancillary State Register instructions (RDasr, WRasr). 


IMPL. DEP. #123-V9: The semantic effect of accessing input/output (I/O) locations 
is implementation dependent. 


IMPL. DEP. #6-V8: Whether the I/O registers can be accessed by nonprivileged code 
is implementation dependent. 


IMPL. DEP. #7-V8: The addresses and contents of I/O registers are implementation 
dependent. 


3.3.1.7 Memory Synchronization 


Two instructions are used for synchronization of memory operations: FLUSH and 
MEMBAR. Their operation is explained in Flush Instruction Memory on page 174 and 
Memory Barrier on page 260, respectively. 


Note | STBAR is also available, but it is deprecated and should not be 
used in newly developed software. 


1. this includes use of store instructions (executed on the same or another virtual processor) that write to 
instruction memory, or any other means of writing into instruction memory (for example, DMA) 


2. this is practiced, for example, by software such as debuggers and dynamic linkers 
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3.3.2 


DID 


Integer Arithmetic / Logical / Shift Instructions 


The arithmetic/logical/shift instructions perform arithmetic, tagged arithmetic, 
logical, and shift operations. With one exception, these instructions compute a result 
that is a function of two source operands; the result is either written into a 
destination register or discarded. The exception, SETHI, can be used in combination 
with other arithmetic and/or logical instructions to create a constant in an R register. 


Shift instructions shift the contents of an R register left or right by a given number of 
bits ("shift count"). The shift distance is specified by a constant in the instruction or 
by the contents of an R register. 


Control Transfer 


Control-transfer instructions (CTIs) include PC-relative branches and calls, register- 
indirect jumps, and conditional traps. Most of the control-transfer instructions are 
delayed; that is, the instruction immediately following a control-transfer instruction 
in logical sequence is dispatched before the control transfer to the target address is 
completed. Note that the next instruction in logical sequence may not be the 
instruction following the control-transfer instruction in memory. 


The instruction following a delayed control-transfer instruction is called a delay 
instruction. Setting the annul bit in a conditional delayed control-transfer instruction 
causes the delay instruction to be annulled (that is, to have no effect) if and only if 
the branch is not taken. Setting the annul bit in an unconditional delayed control- 
transfer instruction ("branch always") causes the delay instruction to be always 
annulled. 


Note | The SPARC V8 architecture specified that the delay instruction 
was always fetched, even if annulled, and that an annulled 
instruction could not cause any traps. The SPARC V9 
architecture does not require the delay instruction to be fetched 
if it is annulled. 


Branch and CALL instructions use PC-relative displacements. The jump and link 
(JMPL) and return (RETURN) instructions use a register-indirect target address. 
They compute their target addresses either as the sum of two R registers or as the 
sum of an R register and a 13-bit signed immediate value. The “branch on condition 
codes without prediction" instruction provides a displacement of +8 Mbytes; the 
“branch on condition codes with prediction" instruction provides a displacement of 
+1 Mbyte; the "branch on register contents" instruction provides a displacement of 
+128 Kbytes; and the CALL instruction's 30-bit word displacement allows a control 
transfer to any address within + 2 gigabytes (+ 2?! bytes). 


Note | The return from privileged trap instructions (DONE and 
RETRY) get their target address from the appropriate TPC or 
TNPC register. 
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core 


3.3.6 


State Register Access 


3.8.1 Ancillary State Registers 


The read and write ancillary state register instructions read and write the contents of 
ancillary state registers visible to nonprivileged software (Y, CCR, ASI, PC, TICK, 
and FPRS) and some registers visible only to privileged software (PCR, SOFTINT, 
TICK_CMPR, and STICK_CMPR). 


IMPL. DEP. #8-V8-Cs20: Ancillary state registers (ASRs) in the range 0-27 that are 
not defined in UltraSPARC Architecture 2005 are reserved for future architectural 
use. ASRs in the range 28-31 are available to be used for implementation-dependent 
purposes. 


IMPL. DEP. #9-V8-Cs20: The privilege level required to execute each of the 
implementation-dependent read/write ancillary state register instructions (for ASRs 
28-31) is implementation dependent. 


3.3.4.2 PR State Registers 


The read and write privileged register instructions (RDPR and WRPR) read and 
write the contents of state registers visible only to privileged software (TPC, TNPC, 
TSTATE, TT, TICK, TBA, PSTATE, TL, PIL, CWP, CANSAVE, CANRESTORE, 
CLEANWIN, OTHERWIN, and WSTATE). 


Floating-Point Operate 


Floating-point operate (FPop) instructions perform all floating-point calculations; 
they are register-to-register instructions that operate on the floating-point registers. 
FPops compute a result that is a function of one or two source operands. The groups 
of instructions that are considered FPops are listed in Floating-Point Operate (FPop) 
Instructions on page 119. 


Conditional Move 


Conditional move instructions conditionally copy a value from a source register to a 
destination register, depending on an integer or floating-point condition code or on 
the contents of an integer register. These instructions can be used to reduce the 
number of branches in software. 
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Register Window Management 


Register window instructions manage the register windows. SAVE and RESTORE 
are nonprivileged and cause a register window to be pushed or popped. FLUSHW is 
nonprivileged and causes all of the windows except the current one to be flushed to 
memory. SAVED and RESTORED are used by privileged software to end a window 
spill or fill trap handler. 


SIMD 


UltraSPARC Architecture 2005 includes SIMD (single instruction, multiple data) 
instructions, also known as "vector" instructions, which allow a single instruction to 
perform the same operation on multiple data items, totalling 64 bits, such as eight 8- 
bit, four 16-bit, or two 32-bit data items. These operations are part of the "VIS" 
extensions. 





3.4 


Traps 


A trap is a vectored transfer of control to privileged software through a trap table 
that may contain the first 8 instructions (32 for some frequently used traps) of each 
trap handler. The base address of the table is established by software in a state 
register (the Trap Base Address register, TBA. The displacement within the table is 
encoded in the type number of each trap and the level of the trap. Part of the trap 
table is reserved for hardware traps, and part of it is reserved for software traps 
generated by trap (Tcc) instructions. 


A trap causes the current PC and NPC to be saved in the TPC and TNPC registers. 
It also causes the CCR, ASI, PSTATE, and CWP registers to be saved in TSTATE. 
TPC, TNPC, and TSTATE are entries in a hardware trap stack, where the number of 
entries in the trap stack is equal to the number of supported trap levels. A trap also 
sets bits in the PSTATE register and typically increments the GL register. Normally, 
the CWP is not changed by a trap; on a window spill or fill trap, however, the CWP 
is changed to point to the register window to be saved or restored. 


A trap can be caused by a Tcc instruction, an asynchronous exception, an instruction- 
induced exception, or an interrupt request not directly related to a particular 
instruction. Before executing each instruction, a virtual processor determines if there 
are any pending exceptions or interrupt requests. If any are pending, the virtual 
processor selects the highest-priority exception or interrupt request and causes a 
trap. 


See Chapter 12, Traps, for a complete description of traps. 
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CHAPTER 4 


Data Formats 





The UltraSPARC Architecture recognizes these fundamental data types: 

m Signed integer: 8, 16, 32, and 64 bits 

m Unsigned integer: 8, 16, 32, and 64 bits 

m SIMD data formats: Uint8 SIMD (32 bits), Int16 SIMD (64 bits), and Int32 SIMD 
(64 bits) 

m Floating point: 32, 64, and 128 bits 


The widths of the data types are as follows: 

Byte: 8 bits 

Halfword: 16 bits 

Word: 32 bits 

Tagged word: 32 bits (30-bit value plus 2-bit tag) 
Doubleword/Extended-word: 64 bits 
Quadword: 128 bits 


The signed integer values are stored as two’s-complement numbers with a width 
commensurate with their range. Unsigned integer values, bit vectors, Boolean 
values, character strings, and other values representable in binary form are stored as 
unsigned integers with a width commensurate with their range. The floating-point 
formats conform to the IEEE Standard for Binary Floating-point Arithmetic, IEEE 
Std 754-1985. In tagged words, the least significant two bits are treated as a tag; the 
remaining 30 bits are treated as a signed integer. 


Data formats are described in these sections: 
m Integer Data Formats on page 34. 

m Floating-Point Data Formats on page 38. 
m SIMD Data Formats on page 41. 


Names are assigned to individual subwords of the multiword data formats as 
described in these sections: 

m Signed Integer Doubleword (64 bits) on page 35. 

m Unsigned Integer Doubleword (64 bits) on page 37. 

m Floating Point, Double Precision (64 bits) on page 39. 

m Floating Point, Quad Precision (128 bits) on page 40. 
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4.1 Integer Data Formats 


TABLE 4-1 describes the width and ranges of the signed, unsigned, and tagged integer 
data formats. 


TABLE 4-1 Signed Integer, Unsigned Integer, and Tagged Format Ranges 








Width 
Data Type (bits) Range 
Signed integer byte 8 -2 to 2 -1 
Signed integer halfword 16 -215 to 215 -1 
Signed integer word 32 -23! to 231-1 
Signed integer doubleword /extended-word 64 -26 to 265 — 1 
Unsigned integer byte 8 0to28-1 
Unsigned integer halfword 16 0 to 26-1 
Unsigned integer word 32 0 to 22 -1 
Unsigned integer doubleword /extended-word 64 0 to 28 - 1 
Integer tagged word 32 0 to 220 - 1 


TABLE 4-2 describes the memory and register alignment for multiword integer data. 
All registers in the integer register file are 64 bits wide, but can be used to contain 
smaller (narrower) data sizes. Note that there is no difference between integer 
extended-words and doublewords in memory; the only difference is how they are 
represented in registers. 


TABLE 4-2 Integer Doubleword / Extended-word Alignment 


Memory Address Register Number 
Subformat Required Address Required Register 
Name Subformat Field Alignment (big-endian)! Alignment Number 
SD-0 signed_dbl_integer{63:32} n mod 8=0 n rmod2=0 
SD-1 signed_dbl_integer{31:0} (n+4)mod8=4 n+4 (r+ 1) mod 2 = 1 





SX signed_ext_integer{63:0} n mod 8 = 0 n — 

UD-0 unsigned dbl integer(63:32] n mod 8 = 0 n rmod 2 =0 

UD-1 unsigned_dbl_integer{31:0} (n+ 4)mod8=4 n+4 (r + 1) mod 2 =1 
UX unsigned_ext_integer{63:0} n mod 8 = 0 n — r 














1. The Memory Address in this table applies to big-endian memory accesses. Word and byte order are reversed when little-endian access- 
es are used. 
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4.1.1 


The data types are illustrated in the following subsections. 


Signed Integer Data Types 


Figures in this section illustrate the following signed data types: 


Signed integer byte 

Signed integer halfword 
Signed integer word 

Signed integer doubleword 
Signed integer extended-word 


4.1.1.1 Signed Integer Byte, Halfword, and Word 


FIGURE 4-1 illustrates the signed integer byte, halfword, and word data formats. 














7 6 0 
SH |S 
1514 0 
SW 3S 
31 30 0 


FIGURE 4-1 Signed Integer Byte, Halfword, and Word Data Formats 


41.1.2 Signed Integer Doubleword (64 bits) 


FIGURE 4-2 illustrates both components (SD-0 and SD-1) of the signed integer double 


data format. 
SD-0 |S signed int doubleword {62:32} 








31 30 0 
SD-1 signed int doubleword(31:0] 
31 0 


FIGURE 4-2. Signed Integer Double Data Format 
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4.1.1.3 Signed Integer Extended-Word (64 bits) 


FIGURE 4-3 illustrates the signed integer extended-word (SX) data format. 


SX Js signed int extended 


63 62 0 





FIGURE 4-3 Signed Integer Extended-Word Data Format 


4.1.2 Unsigned Integer Data Types 


Figures in this section illustrate the following unsigned data types: 


m Unsigned integer byte 
m Unsigned integer halfword 

m Unsigned integer word 

m Unsigned integer doubleword 

m Unsigned integer extended-word 


4.1.2.1 Unsigned Integer Byte, Halfword, and Word 


FIGURE 4-4 illustrates the unsigned integer byte data format. 











7 0 
UH 
15 0 
UW 
81 0 


FIGURE 4-4 Unsigned Integer Byte, Halfword, and Word Data Formats 
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4.1.2.2 Unsigned Integer Doubleword (64 bits) 


FIGURE 4-5 illustrates both components (UD-0 and UD-1) of the unsigned integer 


double data format. 
UD-0 unsigned_int_doubleword {63:32} 








31 0 
UD-1 unsigned int doubleword(31:0] 
31 0 


FIGURE 4-5 Unsigned Integer Double Data Format 


4.1.2.3 Unsigned Extended Integer (64 bits) 


FIGURE 4-6 illustrates the unsigned extended integer (UX) data format. 


UX unsigned int extended 


63 0 





FIGURE 4-6 Unsigned Extended Integer Data Format 


4.1.3 Tagged Word (32 bits) 


FIGURE 4-7 illustrates the tagged word data format. 





TW tag 








al 2 1 0 


FIGURE 4-7 Tagged Word Data Format 
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4.2 


4.2.1 


Floating-Point Data Formats 


Single-precision, double-precision, and quad-precision floating-point data types are 
described below. 


Floating Point, Single Precision (32 bits) 


FIGURE 4-8 illustrates the floating-point single-precision data format, and TABLE 4-3 
describes the formats. 


FS |S! exp{7:0} fraction {22:0} 


31 30 23 22 0 





FIGURE 4-8 Floating-Point Single-Precision Data Format 


TABLE4-3 Floating-Point Single-Precision Format Definition 





s =sign (1 bit) 

e = biased exponent (8 bits) 
f = fraction (23 bits) 

u = undefined 





Normalized value (0 < e < 255): (-1)8§ x 297127 x 1.f 
Subnormal value (e = 0): (Cy x 27126 x O.f 
Zero (e =0,f =0) (-1}$ x 0 
Signalling NaN s =u;e = 255 (max); f = Ouu--uu 
(At least one bit of the fraction must be nonzero) 
Quiet NaN S =u;e =255 (max); f =.luu--uu 
— co (negative infinity) s =1;e = 255 (max); f = .000--00 
+ œ (positive infinity) s =0;e = 255 (max); f =.000--00 
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4.22 


Floating Point, Double Precision (64 bits) 


FIGURE 4-9 illustrates both components (FD-0 and FD-1) of the floating-point double- 
precision data format, and TABLE 4-4 describes the formats. 


FD-0 |S exp{10:0} fraction {51:32} 








31 30 2019 0 
FD-1 fraction{31:0} 
31 0 


FIGURE 4-9 Floating-Point Double-Precision Data Format 


TABLE 4-4 Floating-Point Double-Precision Format Definition 





s =sign (1 bit) 

e = biased exponent (11 bits) 
f = fraction (52 bits) 

u = undefined 





Normalized value (0 < e < 2047): (-1)§ x 2971028 x 1.f 
Subnormal value (e = 0): (-1)8 x 271022 x O.f 
Zero (e =0,f =0) (-15 x 0 
Signalling NaN s =u;e = 2047 (max); f = Ouu--uu 
(At least one bit of the fraction must be nonzero) 
Quiet NaN s =u;e = 2047 (max); f =.luu--uu 
— œ (negative infinity) s =1;e = 2047 (max); f = .000--00 
+ œ (positive infinity) s =0;e = 2047 (max); f = .000--00 
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4.2.3 Floating Point, Quad Precision (128 bits) 


FIGURE 4-10 illustrates all four components (FQ-0 through FQ-3) of the floating-point 
quad-precision data format, and TABLE 4-5 describes the formats. 


























FQ-0 S exp{14:0} | fraction{111:96} 

31 30 1615 0 
FQ-1 fraction{95:64} 

81 0 
FQ-2 fraction{63:32} 

81 0 
FQ-3 fraction {31:0} 

81 0 


FIGURE 4-10 Floating-Point Quad-Precision Data Format 


TABLE 4-5 Floating-Point Quad-Precision Format Definition 





s =sign (1 bit) 

e = biased exponent (15 bits) 
f = fraction (112 bits) 

u = undefined 





Normalized value (0 « e « 32767): (-1)8 x 28716383 x 1.f 
Subnormal value (e - 0): (19 x 271682 x O.f 
Zero (e =0,f =0) (15 x 0 
Signalling NaN s =u;e = 32767 (max); f = Ouu--uu 
(At least one bit of the fraction must be nonzero) 
Quiet NaN s =u;e = 32767 (max); f =.luu--uu 
— co (negative infinity) s =1;e = 32767 (max); f = .000--00 
+ œ (positive infinity) s =0;e = 32767 (max); f = .000--00 
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4.2.4 


Floating-Point Data Alignment in Memory and 
Registers 


TABLE 4-6 describes the address and memory alignment for floating-point data. 











TABLE 4-6 Floating-Point Doubleword and Quadword Alignment 

Memory Address Register Number 
Subformat Required Address Required Register 
Name Subformat Field Alignment (big-endian)* | Alignment Number 
FD-0 s:exp{10:0}:fraction{51:32} 0 mod 4 n 0 mod 2 f 
FD-1 fraction{31:0} 0 mod 4 n+4 1mod2  f+1° 
FQ-0 s:exp{14:0}:fraction{111:96} 0 mod 4 n 0 mod 4 f 
FQ-1 fraction(95:64] 0 mod 4 n4 1mod4 f+1° 
FO-2 fraction{63:32} 0 mod 4 n+8 2 mod 4 f+2 
FQ-3 fraction{31:0} 0 mod 4 n+12 3mod4 f+3° 

















* The memory Address in this table applies to big-endian memory accesses. Word and byte order are reversed when little-endian 
accesses are used. 


+ 


Although a floating-point doubleword is required only to be word-aligned in memory, it is recommended that it be double- 


word-aligned (that is, the address of its FD-0 word should be 0 mod 8 so that it can be accessed with doubleword loads/stores 
instead of multiple singleword loads/stores). 


++ 


Although a floating-point quadword is required only to be word-aligned in memory, it is recommended that it be quadword- 


aligned (that is, the address of its FQ-0 word should be 0 mod 16). 


© 


Note that this 32-bit floating-point register is only directly addressable in the lower half of the register file (that is, if its register 


number is < 31). 





4.3 


SIMD Data Formats 


SIMD (single instruction/ multiple data) instructions perform identical operations on 
multiple data contained (“packed”) in each source operand. This section describes 
the data formats used by SIMD instructions. 


Conversion between the different SIMD data formats can be achieved through SIMD 
multiplication or by the use of the SIMD data formatting instructions. 
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4.3.1 


4.3.2 


Programming 


The SIMD data formats can be used in graphics calculations to 
Note 


represent intensity values for an image (e.g., a, B, G, R). 


Intensity values are typically grouped in one of two ways, when 
using SIMD data formats: 


= Band interleaved images, with the various color components 
of a point in the image stored together, and 


m Band sequential images, with all of the values for one color 
component stored together. 





Uint8 SIMD Data Format 


The Uint8 SIMD data format consists of four unsigned 8-bit integers contained in a 
32-bit word (see FIGURE 4-11). 


Uint8 SIMD values value; values values 





31 24 23 16 15 87 0 


FIGURE 4-11 Uint8 SIMD Data Format 


Int16 SIMD Data Formats 


The Int16 SIMD data format consists of four signed 16-bit integers contained in a 64- 
bit word (see FIGURE 4-12). 





Int16 
simp | values S4 value, So values S3 values 
63 62 48 47 46 32 31 30 16 15 14 0 
FIGURE 4-12 Int16 SIMD Data Format 
4.3.3 Int32 SIMD Data Format 
The Int32 SIMD data format consists of two signed 32-bit integers contained in a 64- 
bit word (see FIGURE 4-13). 
Int32 
simp | °° value, S4 value, 
63 62 32 31 30 0 





FIGURE 4-13 Int32 SIMD Data Format 
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Programming | The integer SIMD data formats can be used to hold fixed-point 
Note | data. The position of the binary point in a SIMD datum is 
implied by the programmer and does not influence the 
computations performed by instructions that operate on that 
SIMD data format. 
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CHAPTER 5 


Registers 





The following registers are described in this chapter: 

m General-Purpose R Registers on page 46. 

m Floating-Point Registers on page 52. 

m Floating-Point State Register (FSR) on page 58. 

a Ancillary State Registers on page 67. The following registers are included in this 
category: 


32-bit Multiply/Divide Register (Y) (ASR 0) on page 69. 

Integer Condition Codes Register (CCR) (ASR 2) on page 69. 
Address Space Identifier (ASI) Register (ASR 3) on page 71. 

Tick (TICK) Register (ASR 4) on page 71. 

Program Counters (PC, NPC) (ASR 5) on page 72. 

Floating-Point Registers State (FPRS) Register (ASR 6) on page 73. 
Performance Control Register (PCRP) (ASR 16) on page 74. 
Performance Instrumentation Counter (PIC) Register (ASR 17) on page 75. 
General Status Register (GSR) (ASR 19) on page 76. 

SOFTINT? Register (ASRs 20, 21, 22) on page 77. 

SOFTINT SETP Pseudo-Register (ASR 20) on page 78. 

SOFTINT CLRP Pseudo-Register (ASR 21) on page 79. 

Tick Compare (TICK CMPRP) Register (ASR 23) on page 79. 

System Tick (STICK) Register (ASR 24) on page 80. 

System Tick Compare (STICK CMPRP) Register (ASR 25) on page 81. 


m Register-Window PR State Registers on page 81. The following registers are 
included in this subcategory: 


Current Window Pointer (CWPP) Register (PR 9) on page 82. 
Savable Windows (CANSAVE?) Register (PR 10) on page 83. 
Restorable Windows (CANRESTORE?) Register (PR 11) on page 83. 
Clean Windows (CLEANWINP) Register (PR 12) on page 83. 

Other Windows (OTHERWINP) Register (PR 13) on page 84. 
Window State (WSTATE?) Register (PR 14) on page 84. 


m Non-Register-Window PR State Registers on page 86. The following registers are 
included in this subcategory: 


Trap Program Counter (TPCP) Register (PR 0) on page 86. 
Trap Next PC (TNPCP) Register (PR 1) on page 87. 
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Trap State (TSTATE?) Register (PR 2) on page 88. 

Trap Type (TT?) Register (PR 3) on page 89. 

Trap Base Address (TBA!) Register (PR 5) on page 90. 
Processor State (PSTATE®) Register (PR 6) on page 90. 

Trap Level Register (TL?) (PR 7) on page 94. 

Processor Interrupt Level (PIL?) Register (PR 8) on page 95. 
Global Level Register (GLP) (PR 16) on page 96. 


There are additional registers that may be accessed through ASIs; those registers are 
described in Chapter 10, Address Space Identifiers (ASIs). 





5.1 


Reserved Register Fields 


For convenience, some registers in this chapter are illustrated as fewer than 64 bits 
wide. Any bits not shown (or explicitly marked as reserved) are reserved for future 
extensions to the architecture. 


Such a reserved field within a register reads as zero in current implementations and, 
when written by software, should only be written with the value of that field 
previously read from that register or with the value zero. 


Programming | Software intended to run on future versions of the UltraSPARC 
Note | Architecture should not assume that reserved register fields will 
read as 0 or any other particular value. 





5:2 


General-Purpose R Registers 


An UltraSPARC Architecture virtual processor contains an array of general-purpose 
64-bit R registers. The array is partitioned into MAXPGL + 1 sets of eight global 
registers, plus N_REG_WINDOWS groups of 16 registers each. The value of 
N_REG_WINDOWS in an UltraSPARC Architecture implementation falls within the 
range 3 to 32 (inclusive). 


One set of 8 global registers is always visible. At any given time, a group of 24 
registers, known as a register window, is also visible. A register window comprises 
the 16 registers from the current 16-register group (referred to as 8 in registers and 8 
local registers), plus half of the registers from the next 16-register group (referred to 
as 8 out registers). See FIGURE 5-1. 
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SPARC instructions use 5-bit fields to reference R registers. That is, 32 R registers are 
visible to software at any moment. Which 32 out of the full set of R registers are 
visible is described in the following sections. The visible 32 R registers are named 
R[0] through R[31], illustrated in FIGURE 5-1. 


ins 


locals 





outs 





globals 





FIGURE 5-1 General-Purpose Registers (as Visible at Any Given Time) 
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o] 


5.2.2 


Global R Registers 


Registers R[0] - R[7] refer to a set of eight registers called the global registers (labeled 
g0 through g7). At any time, one of MAXPGL +1 sets of eight registers is enabled and 
can be accessed as the current set of global registers. The currently enabled set of 
global registers is selected by the GL register. See Global Level Register (GL?) (PR 16) 
on page 96. 


Global register zero (G0) always reads as zero; writes to it have no software-visible 
effect. 


Windowed R Registers 


A set of 24 R registers that is visible as R[8]-R[31] at any given time is called a 
"register window". The registers that become R[8]-R[15] in a register window are 
called the out registers of the window. Note that the in registers of a register window 
become the out registers of an adjacent register window. See TABLE 5-1 and 

FIGURE 5-2. 


The names in, local, and out originate from the fact that the out registers are typically 
used to pass parameters from (out of) a calling routine and that the called routine 
receives those parameters as its in registers. 


TABLE5-1 Window Addressing 





Windowed Register Address R Register Address 
in[0] — in[7] R[24] - R[31] 
local[0] — local[7] R[16] — R[23] 
out[0] — out[7] R[ 8] - R[15] 
global[0] — global[7] R[ 0] - R[ 7] 





V9 Compatibility | In the SPARC V9 architecture, the number of 16-register 
Note | windowed register sets, N REG WINDOWS, ranges from 3 to 32 

(impl. dep. #2-V8). The maximum global register set index in the 
UltraSPARC Architecture, MAXPGL, ranges from 2 to 15. The 
number of implemented global register sets is MAXPGL + 1. The 
total number of R registers in a given UltraSPARC Architecture 
implementation is: 

(N REG WINDOWS x 16) + (( MAXPGL + 1) x 8) 
Therefore, an UltraSPARC Architecture processor may contain 
from 72 to 640 R registers. 
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The current window in the windowed portion of R registers is indicated by the 
current window pointer (CWP) register. The CWP is decremented by the RESTORE 
instruction and incremented by the SAVE instruction. 


Window (CWP — 1) 























R[31] 
s ins 
R[24] 
R[23] 
i locals 
R[16] Window (CWP) 
R[15] R[31] 
: outs f ins 
R[ 8] R[24] 
R[23] 
: locals 
R[16] Window (CWP + 1) 
R[15] R[31 
: outs : ins 
R[ 8] R[24 
R[23 
; locals 
R[16 
R[15 
; outs 
RI 8 














63 0 
FIGURE 5-2 Three Overlapping Windows and Eight Global Registers 


Overlapping Windows. Each window shares its ins with one adjacent window 
and its outs with another. The outs of the CWP - 1 (modulo N REG WINDOWS) 
window are addressable as the ins of the current window, and the outs in the current 
window are the ins of the CWP + 1 (modulo N REG WINDOWS) window. The locals 
are unique to each window. 


Register address o, where 8 x o < 15, refers to exactly the same out register before the 
register window is advanced by a SAVE instruction (CWP is incremented by 1 

(modulo N REG WINDOWS)) as does register address 0-16 after the register window 
is advanced. Likewise, register address i, where 24 < i € 31, refers to exactly the same 
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in register before the register window is restored by a RESTORE instruction (CWP is 
decremented by 1 (modulo N_REG_WINDOWS)) as does register address i—16 after the 
window is restored. See FIGURE 5-2 on page 49 and FIGURE 5-3 on page 51. 


To application software, the virtual processor appears to provide an infinitely-deep 
stack of register windows. 


Programming | Since the procedure call instructions (CALL and JMPL) do not 
Note | change the CWP, a procedure can be called without changing 
the window. See the section “Leaf-Procedure Optimization” in 
Software Considerations, contained in the separate volume 
UltraSPARC Architecture Application Notes 


Since CWP arithmetic is performed modulo N REG WINDOWS, the highest-numbered 
implemented window overlaps with window 0. The outs of window 

N REG WINDOWS — 1 are the ins of window 0. Implemented windows are numbered 
contiguously from 0 through N REG WINDOWS —1. 


Because the windows overlap, the number of windows available to software is 1 less 
than the number of implemented windows; that is, N REG WINDOWS — 1. When the 
register file is full, the outs of the newest window are the ins of the oldest window, 
which still contains valid data. 


Window overflow is detected by the CANSAVE register, and window underflow is 
detected by the CANRESTORE register, both of which are controlled by privileged 
software. A window overflow (underflow) condition causes a window spill (fill) 
trap. 


When a new register window is made visible through use of a SAVE instruction, the 
local and out registers are guaranteed to contain either zeroes or valid data from the 
current context. If software executes a RESTORE and later executes a SAVE, then the 
contents of the resulting window's local and out registers are not guaranteed to be 
preserved between the RESTORE and the SAVE!. Those registers may even have 
been written with "dirty" data, that is, data created by software running in a 
different context. However, if the clean window protocol is being used, system 
software must guarantee that registers in the current window after a SAVE always 
contains only zeroes or valid data from that context. See Clean Windows 
(CLEANWINP) Register (PR 12) on page 83, Savable Windows (CANSAVEP) Register 
(PR 10) on page 83, and Restorable Windows (CANRESTORE?) Register (PR 11) on 
page 83. 


Implementation | An UltraSPARC Architecture virtual processor supports the 
Note | guarantee in the preceding paragraph of "either zeroes or valid 
data from the current context"; it may do so either in hardware 
or in a combination of hardware and system software. 


1- For example, any of those 16 registers might be altered due to the occurrence of a trap between the RESTORE 
and the SAVE, or might be altered during the RESTORE operation due to the way that register windows are 
implemented. After a RESTORE instruction executes, software must assume that the values of the affected 16 
registers from before the RESTORE are unrecoverable. 
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Register Window Management Instructions on page 116 describes how the windowed 
integer registers are managed. 


CWP =0 
(CURRENT WINDOW POINTER) 


\ 


w0 locals 






CANSAVE =4 





RESTORE 


CAN QE (Overlap) 


CANSAVE + CANRESTORE + OTHERWIN = N REG WINDOWS — 2 


The current window (window 0) and the overlap window (window 5) account for the 
two windows in the right side of the equation. The “overlap window” is the window 
that must remain unused because its ins and outs overlap two other valid windows. 


FIGURE 5-3 Windowed R Registers for N_REG_WINDOWS = 8 
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5200 


In FIGURE 5-3, N_REG_WINDOWS = 8. The eight global registers are not illustrated. 
CWP = 0, CANSAVE = 4, OTHERWIN = 1, and CANRESTORE = 1. If the procedure 
using window w0 executes a RESTORE, then window w7 becomes the current 
window. If the procedure using window w0 executes a SAVE, then window w1 
becomes the current window. 


Special R Registers 


The use of two of the R registers is fixed, in whole or in part, by the architecture: 
m The value of R[0] is always zero; writes to it have no program-visible effect. 


m The CALL instruction writes its own address into register R[15] (out register 7). 


Register-Pair Operands. LDTW, LDTWA, STTW, and STTWA instructions access 
a pair of words ("twin words") in adjacent R registers and require even-odd register 
alignment. The least significant bit of an R register number in these instructions is 
unused and must always be supplied as 0 by software. 


When the R[0]-R[1] register pair is used as a destination in LDTW or LDTWA, only 
R[1] is modified. When the R[0]-R[1] register pair is used as a source in STTW or 
STTWA, 0 is read from R[0], so 0 is written to the 32-bit word at the lowest address, 
and the least significant 32 bits of R[1] are written to the 32-bit word at the highest 
address. 


An attempt to execute anLDTW, LDTWA, STTW, or STTWA instruction that refers 
to a misaligned (odd) destination register number causes an illegal instruction trap. 





0:9 


Floating-Point Registers 


The floating-point register set consists of sixty-four 32-bit registers, which may be 
accessed as follows: 


m Sixteen 128-bit quad-precision registers, referenced as Fa[0], Fal4l, ..., Fal60] 
m Thirty-two 64-bit double-precision registers, referenced as Fp[0], Fp[2], ..., Fp[62] 


m Thirty-two 32-bit single-precision registers, referenced as Fs[0], Fs[1], ..., Fs[31] 
(only the lower half of the floating-point register file can be accessed as single- 
precision registers) 


The floating-point registers are arranged so that some of them overlap, that is, are 
aliased. The layout and numbering of the floating-point registers are shown in 
TABLE 5-2. Unlike the windowed R registers, all of the floating-point registers are 
accessible at any time. The floating-point registers can be read and written by 
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floating-point operate (FPop1/FPop2 format) instructions, by load/store single/ 
double/quad floating-point instructions, by VIS™ instructions, and by block load 
and block store instructions. 


TABLE 5-2 Floating-Point Registers, with Aliasing (1 of 3) 


Single Precision Double Precision Quad Precision 
(32-bit) (64-bit) (128-bit) 


Assembly Assembly Assembly 
Register Language | Bits Register Language | Bits Register Language 







































Fe[16] %£16 
Fg[17] %£17 
Fg[18] %£18 
Fg[19] %£19 
Fg[20] $£20 


Fa[20] %q20 





Fp[22] %a22  |63:0 





Fg[0] £0 
Fp[0] 
Fell] £i 
Fal0] 
Fel2]  +£2 
Fp[2] 

Fall 353 
Fsl4] * 

s[4] £4 Fold] 
Fe[5]  $£5 
Fs[6] £6 
Fel7] £7 
Fe[8]] sre 
Fsl9] 359 
Fs[10] £10 
Fe[ll] +11 
Fsl12] +12 
Fe[13] %£13 
Fs[14] %£14 

[ 

[ 

[ 

[ 

[ 

[ 

[ 

[ 

[ 


] 
] 

Fg[22] %£22 | 63:32 
] 


$f23 31:0 
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TABLE 5-2 Floating-Point Registers, with Aliasing (2 of 3) 


Single Precision Double Precision Quad Precision 
(32-bit) (64-bit) (128-bit) 


Assembly Assembly Assembly 
Register Language | Bits Register Language | Bits Register Language 





$f24 
Fp[24] 


FQ[24] 
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TABLE 5-2 Floating-Point Registers, with Aliasing (3 of 3) 


Single Precision Double Precision Quad Precision 
(32-bit) (64-bit) (128-bit) 


Assembly Assembly Assembly 
Register Language | Bits Register Language | Bits Register Language 















Fp[52] 
Fal52] 











od Floating-Point Register Number Encoding 


Register numbers for single, double, and quad registers are encoded differently in 
the 5-bit register number field of a floating-point instruction. If the bits in a register 
number field are labeled b{4} ... b{0} (where b{4} is the most significant bit of the 
register number), the encoding of floating-point register numbers into 5-bit 
instruction fields is as given in TABLE 5-3. 


TABLE 5-3 Floating-Point Register Number Encoding 


Register Operand Encoding in a 5-bit Register Field in an 
Type Full 6-bit Register Number Instruction 


0 b{4} b{3} b{2} b{1} b{0} b{4} b{3} b{2} b{1} b{0} 


b{5} b{4} b{3} b{2} b{1} 0 b{4} b{3} b{2} b{1} b{5} 
b{5} b{4} b{3} b{2} 0 0 b{4} b{3} b{2} 0 b{5} 
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SPARC V8 | In the SPARC V8 architecture, bit 0 of double and quad register 
Compatibility | numbers encoded in instruction fields was required to be zero. 
Note | Therefore, all SPARC V8 floating-point instructions can run 
unchanged on an UltraSPARC Architecture virtual processor, 
using the encoding in TABLE 5-3. 


5:5:2 Double and Quad Floating-Point Operands 


A single 32-bit F register can hold one single-precision operand; a double-precision 
operand requires an aligned pair of F registers, and a quad-precision operand 
requires an aligned quadruple of F registers. At a given time, the floating-point 
registers can hold a maximum of 32 single-precision, 16 double-precision, or 8 quad- 
precision values in the lower half of the floating-point register file, plus an 
additional 16 double-precision or 8 quad-precision values in the upper half, or 
mixtures of the three sizes. 
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Programming 
Note 


Programming 
Note 


Implementation 
Note 





The upper 16 double-precision (upper 8 quad-precision) 
floating-point registers cannot be directly loaded by 32-bit load 
instructions. Therefore, double- or quad-precision data that is 
only word-aligned in memory cannot be directly loaded into the 
upper registers with LDF[A] instructions. The following 
guidelines are recommended: 

1. Whenever possible, align floating-point data in memory on 


proper address boundaries. If access to a datum is required to 
be atomic, the datum must be properly aligned. 


2. If a double- or quad-precision datum is not properly aligned 


in memory or is still aligned on a 4-byte boundary, and access 

to the datum in memory is not required to be atomic, then 

software should attempt to allocate a register for it in the 
lower half of the floating-point register file so that the datum 
can be loaded with multiple LDF[A] instructions. 

3. If the only available registers for such a datum are located in 
the upper half of the floating-point register file and access to 
the datum in memory is not required to be atomic, the word- 
aligned datum can be loaded into them by one of two 
methods: 

» Load the datum into an upper register by using multiple 
LDF[A] instructions to first load it into a double- or quad- 
precision register in the lower half of the floating-point 
register file, then copy that register to the desired 
destination register in the upper half 

a Use an LDDF[A] or LDOF[A] instruction to perform the 
load directly into the upper floating-point register, 
understanding that use of these instructions on poorly 
aligned data can cause a trap (LDDF. mem not aligned) on 
some implementations, possibly slowing down program 
execution significantly. 


If an UltraSPARC Architecture 2005 implementation does not 
implement a particular quad floating-point arithmetic operation 
in hardware and an invalid quad register operand is specified, 
per FSR.ftt priorities in TABLE 5-7, the fp exception other 
exception occurs with FSR.ftt = 3 (unimplemented FPop) 
instead of with FSR.ftt = 6 (invalid fp. register). 


UltraSPARC Architecture 2005 implementations do not 
implement any quad floating-point arithmetic operations in 
hardware. Therefore, an attempt to execute any of them results 
in a trap on the fp exception other exception with FSR.ftt = 3 
(unimplemented_FPop). 
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5.4 


Floating-Point State Register (FSR) 


The Floating-Point State register (FSR) fields, illustrated in FIGURE 5-4, contain FPU 
mode and status information. The lower 32 bits of the FSR are read and written by 
the (deprecated) STFSR and LDFSR instructions, respectively. The 64-bit FSR 
register is read by the STXFSR instruction and written by the LDXFSR instruction. 
The ver, ftt, qne, unimplemented (for example, ns), and reserved ("—") fields of 
FSR are not modified by either LDFSR or LDXFSR. 


RW RW RW 


LL 


63 
FSR 


RW 


38 37 36 35 34 33 32 


ICI BEIICOREDe 


31 30 29 28 27 23 22 21 20 19 17 16 14 13 12 11 10 9 


5.4.1 


FIGURE 5-4 FSR Fields 


Bits 63-38, 29-28, 21-20, and 12 of FSR are reserved. When read by an STXFSR 
instruction, these bits always read as zero 


Programming 
Note 


For future compatibility, software should issue LDXFSR 
instructions only with zero values in these bits or values of these 
bits exactly as read by a previous STXFSR. 





The subsections on pages 58 through 67 describe the remaining fields in the FSR. 


Floating-Point Condition Codes (fccO, fcc1, fcc2, 
fcc3) 


The four sets of floating-point condition code fields are labeled fcc0, fcc1, fcc2, and 
fcc3 (fccn refers to any of the floating-point condition code fields). 


The fccO field consists of bits 11 and 10 of the FSR, fcc1 consists of bits 33 and 32, 
fec2 consists of bits 35 and 34, and fcc3 consists of bits 37 and 36. Execution of a 
floating-point compare instruction (FCMP or FCMPE) updates one of the fcon fields 
in the FSR, as selected by the compare instruction. The fcon fields are read by 
STXFSR and written by LDXFSR. The fcc0 field can also be read and written by 
STFSR and LDFSR, respectively. FBfcc and FBPfcc instructions base their control 
transfers on the content of these fields. The MOVcc and FMOVcc instructions can 
conditionally copy a register, based on the contents of these fields. 
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5.4.2 


5.4.3 


In TABLE 5-5, f,1 and f,,2 correspond to the single, double, or quad values in the 
floating-point registers specified by a floating-point compare instruction’s rs1 and 
rs2 fields. The question mark (?) indicates an unordered relation, which is true if 
either f;s1 or frs2 is a signalling NaN or a quiet NaN. If FCMP or FCMPE generates 
an fp_exception_ieee_754 exception, then fccn is unchanged. 


TABLE 5-4 Floating-Point Condition Codes (fccn) Fields of FSR 





Content of fccn Indicated Relation 

0 F[rs1] = F[rs2] 

1 F[rs1] « F[rs2] 

2 F[rs1] » F[rs2] 

3 F[rs1] ? F[rs2] (unordered) 





TABLE 5-5 Floating-Point Condition Codes (fcc) Fields of FSR 


Content of fecn 





0 1 2 3 
Indicated Relation  F[rs1] = F[rs2] F[rs1] < F[rs2] F[rs1] > F[rs2] F[rs1] ? F[rs2] 
(FCMP*, FCMPE*) (unordered) 


Rounding Direction (rd) 


Bits 31 and 30 select the rounding direction for floating-point results according to 
IEEE Std 754-1985. TABLE 5-6 shows the encodings. 


TABLE 5-6 Rounding Direction (rd) Field of FSR 





rd Round Toward 

0 Nearest (even, if tie) 
1 0 

2 +00 

3 — oo 


If the interval mode bit of the General Status register has a value of 1 (GSR.im - 1), 
then the value of FSR.rd is ignored and floating-point results are instead rounded 
according to GSR.irnd. See General Status Register (GSR) (ASR 19) on page 76 for 
further details. 


Trap Enable Mask (tem) 
Bits 27 through 23 are enable bits for each of the five IEEE-754 floating-point 


exceptions that can be indicated in the current exception field (cexc). See FIGURE 5-6 
on page 66. If a floating-point instruction generates one or more exceptions and the 
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5.4.4 


5.4.5 


5.4.6 


tem bit corresponding to any of the exceptions is 1, then this condition causes an 
fp_exception_ieee_754 trap. A tem bit value of 0 prevents the corresponding IEEE 
754 exception type from generating a trap. 


Nonstandard Floating-Point (ns) 


On an UltraSPARC Architecture 2005 processor, FSR.ns is a reserved bit; it always 
reads as 0 and writes to it are ignored. (impl. dep. #18-V8) 


FPU Version (ver) 


IMPL. DEP. #19-V8: Bits 19 through 17 identify one or more particular 
implementations of the FPU architecture. 


For each SPARC V9 IU implementation, there may be one or more FPU 
implementations, or none. FSR.ver identifies the particular FPU implementation 
present. The value in FSR.ver for each implementation is strictly implementation 
dependent. Consult the appropriate document for each implementation for its 
setting of FSR.ver. 


FSR.ver = 7 is reserved to indicate that no hardware floating-point controller is 
present. 


The ver field of FSR is read-only; it cannot be modified by the LDFSR or LDXFSR 
instructions. 


Floating-Point Trap Type (ftt) 


Several conditions can cause a floating-point exception trap. When a floating-point 
exception trap occurs, FSR.ftt (FSR{16:14}) identifies the cause of the exception, the 
“floating-point trap type." After a floating-point exception occurs, FSR.ftt encodes 
the type of the floating-point exception until it is cleared (set to 0) by execution of an 
STFSR, STXFSR, or FPop that does not cause a trap due to a floating-point exception. 


The FSR.ftt field can be read by a STFSR or STXFSR instruction. The LDFSR and 
LDXFSR instructions do not affect FSR.ftt. 
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Privileged software that handles floating-point traps must execute an STFSR (or 
STXFSR) to determine the floating-point trap type. STFSR and STXFSR set FSR.ftt to 
zero after the store completes without error. If the store generates an error and does 
not complete, FSR.ftt remains unchanged. 


Programming | Neither LDFSR nor LDXFSR can be used for the purpose of 

Note | clearing the ftt field, since both leave ftt unchanged. However, 
executing a nontrapping floating-point operate (FPop) 
instruction such as “fmovs %f0,%f£0” prior to returning to 
nonprivileged mode will zero FSR. ftt. The ftt field remains zero 
until the next FPop instruction completes execution. 





FSR.ftt encodes the primary condition (“floating-point trap type”) that caused the 
generation of an fp exception other or fp_exception_ieee_754 exception. It is 
possible for more than one such condition to occur simultaneously; in such a case, 
only the highest-priority condition will be encoded in FSR.ftt. The conditions 
leading to fp. exception other and fp exception ieee 754 exceptions, their relative 
priorities, and the corresponding FSR.ftt values are listed in TABLE 5-7. Note that the 
FSR.ftt values 4 and 5 were defined in the SPARC V9 architecture but are not 
currently in use, and that the value 7 is reserved for future architectural use. 


TABLE 5-7 FSR Floating-Point Trap Type (ftt) Field 


Relative fout 
Condition Detected During Priority FSR.ftt Set 
Execution of an FPop (1 z highest) to Value Exception Generated 
unimplemented FPop 10 3 fp exception other 
invalid fp. register 20 6 fp exception other 
unfinished FPop 30 2 fp exception other 
IEEE 754 exception 40 1 fp exception ieee 754 
Reserved — 4,5,7 — 
(none detected) — 0 — 





The IEEE 754 exception, unimplemented FPop, and unfinished. FPop conditions 
will likely arise occasionally in the normal course of computation and must be 
recoverable by system software. 


When a floating-point trap occurs, the following results are observed by user 
software: 


1. The value of aexc is unchanged. 


2. When an fp exception ieee 754 trap occurs, a bit corresponding to the trapping 
exception is set in cexc. On other traps, the value of cexc is unchanged. 


3. The source and destination registers are unchanged. 


4. The value of fccn is unchanged. 
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The foregoing describes the result seen by a user trap handler if an IEEE exception is 
signalled, either immediately from an fo. exception ieee 754 exception or after 
recovery from an unfinished_FPop or unimplemented_FPop. In either case, cexc as 
seen by the trap handler reflects the exception causing the trap. 


In the cases of an fp. exception other exception with a floating-point trap type of 
unfinished FPop or unimplemented FPop that does not subsequently generate an 
IEEE trap, the recovery software should set cexc, aexc, and the destination register 
or fccn, as appropriate. 


ftt = 1 (IEEE 754 exception). The IEEE 754 exception floating-point trap type 
indicates the occurrence of a floating-point exception conforming to IEEE Std 754- 
1985. The IEEE 754 exception type (overflow, inexact, etc.) is set in the cexc field. The 
aexc and fccn fields and the destination F register are unchanged. 


ftt = 2 (unfinished FPop). The unfinished FPop floating-point trap type indicates 
that the virtual processor was unable to generate correct results or that exceptions as 
defined by IEEE Std 754-1985 have occurred. In cases where exceptions have 
occurred, the cexc field is unchanged. 


IMPL. DEP. #248-U3: The conditions under which an fp exception other exception 
with floating-point trap type of unfinished. FPop can occur are implementation 
dependent. An implementation may cause fp exception other with 

FSR.ftt = unfinished FPop under a different (but specified) set of conditions. 


ftt = 3 (unimplemented FPop). The unimplemented FPop floating-point trap 
type indicates that the virtual processor decoded an FPop that it does not implement 
in hardware. In this case, the cexc field is unchanged. 


For example, all quad-precision FPop variations in an UltraSPARC Architecture 2005 
virtual processor cause an fp exception other exception, setting 
FSR.ftt = unimplemented_FPop. 


Forward | The next revision of the UltraSPARC Architecture is expected to 
Compatibility | eliminate “unimplemented_FPop”, to simplify handling of 
Note | unimplemented instructions. At that point, all conditions which 
currently cause cause fp exception other with FSR.ftt = 3 will 
cause an illegal instruction exception, instead. FSR.ftt = 3 and 
the trap type associated with fp exception other will become 
reserved for other possible future uses. 





62 UltraSPARC Architecture 2005 * Draft DO.9.2, 19 Jun 2008 


5.4.7 


5.4.8 


ftt = 4 (Reserved). 


SPARC V9 | In the SPARC V9 architecture, FSR.ftt = 4 was defined to be 
Compatibility | "sequence error", for use with certain error conditions 
Note | associated with a floating-point queue (FQ). Since UltraSPARC 
Architecture implementations generate precise (rather than 
deferred) traps for floating-point operations, an FQ is not 
needed; therefore sequence_error conditions cannot occur and 
ftt =4 has been returned to the pool of reserved ftt values. 





ftt = 5 (Reserved). 


SPARC V9 | In the SPARC V9 architecture, FSR.ftt = 5 was defined to be 

Compatibility | "hardware error", for use with hardware error conditions 
Note | associated with an external floating-point unit (FPU) operating 

asynchronously to the main processor (IU). Since UltraSPARC 
Architecture processors are now implemented with an integral 
FPU, a hardware error in the FPU can generate an exception 
directly, rather than indirectly report the error through FSR fit 
(as was required when FPUs were external to IUs). Therefore, 
ftt = 5 has been returned to the pool of reserved ftt values. 





ftt = 6 (invalid fp register). This trap type indicates that one or more F register 
operands of an FPop are misaligned; that is, a quad-precision register number is not 
0 mod 4. An implementation generates an fp. exception other trap with FSR.ftt = 
invalid, fp. register in this case. 


Implementation | Per FSR.ftt priorities in TABLE 5-7, if an UltraSPARC Architecture 
Note | 2005 processor does not implement a particular quad FPop in 
hardware, that FPop generates an fp. exception other exception 
with FSR.ftt = 3 (unimplemented FPop) instead of 
fp exception other with FSR.ftt = 6 (invalid fp. register), 
regardless of the specified F registers. 


FQ Not Empty (qne) 


Since UltraSPARC Architecture 2005 virtual processors do not implement a floating- 
point queue, FSR.qne always reads as zero and writes to FSR.qne are ignored. 


Accrued Exceptions (aexc) 


Bits 9 through 5 accumulate IEEE 754 floating-point exceptions as long as floating- 
point exception traps are disabled through the tem field. See FIGURE 5-7 on page 66. 
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After an FPop completes with ftt = 0, the tem and cexc fields are logically anded 
together. If the result is nonzero, aexc is left unchanged and an 

fp exception ieee 754 trap is generated; otherwise, the new cexc field is ored into 
the aexc field and no trap is generated. Thus, while (and only while) traps are 
masked, exceptions are accumulated in the aexc field. 


FSR.aexc can be set to a specific value when an LDFSR or LDXFSR instruction is 
executed. 


Current Exception (cexc) 


FSR.cexc (FSR{4:0}) indicates whether one or more IEEE 754 floating-point 
exceptions were generated by the most recently executed FPop instruction. The 
absence of an exception causes the corresponding bit to be cleared (set to 0). See 
FIGURE 5-6 on page 66. 


Programming | If the FPop traps and software emulate or finish the instruction, 
Note | the system software in the trap handler is responsible for 
creating a correct FSR.cexc value before returning to a 
nonprivileged program. 


The cexc bits are set as described in Floating-Point Exception Fields on page 65, by the 
execution of an FPop that either does not cause a trap or causes an 

fp exception ieee 754 exception with FSR.ftt = IEEE 754 exception. An IEEE 754 
exception that traps shall cause exactly one bit in FSR.cexc to be set, corresponding 
to the detected IEEE Std 754-1985 exception. 


Floating-point operations which cause an overflow or underflow condition may also 
cause an “inexact” condition. For overflow and underflow conditions, FSR.cexc bits 
are set and trapping occurs as follows: 


m If an IEEE 754 overflow condition occurs: 


a if FSR.tem.ofm = 0 and tem.nxm = 0, the FSR.cexc.ofc and FSR.cexc.nxc bits 
are both set to 1, the other three bits of FSR.cexc are set to 0, and an 
fp exception ieee 754 trap does not occur. 


a if FSR.tem.ofm = 0 and tem.nxm = 1, the FSR.cexc.nxc bit is set to 1, the other 
four bits of FSR.cexc are set to 0, and an fp. exception ieee 754 trap does 
occur. 


a if FSR.tem.ofm = 1, the FSR.cexc.ofc bit is set to 1, the other four bits of 
FSR.cexc are set to 0, and an fp exception ieee 754 trap does occur. 


m If an IEEE 754 underflow condition occurs: 


a if FSR.tem.ufm = 0 and FSR.tem.nxm = 0, the FSR.cexc.ufc and FSR.cexc.nxc 
bits are both set to 1, the other three bits of FSR.cexc are set to 0, and an 
fp exception ieee 754 trap does not occur. 
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a if FSR.tem.ufm = 0 and FSR.tem.nxm = 1, the FSR.cexc.nxc bit is set to 1, the 
other four bits of FSR.cexc are set to 0, and an fp exception ieee 754 trap 
does occur. 


a if FSR.tem.ufm = 1, the FSR.cexc.ufc bit is set to 1, the other four bits of 
FSR.cexc are set to 0, and an fp exception ieee 754 trap does occur. 


The above behavior is summarized in TABLE 5-8 (where “W” indicates "exception was 
detected" and "x" indicates "don't care"): 


TABLE 5-8 Setting of FSR.cexc Bits 








Conditions Results 
Exception(s) Current 
Detected Trap Enable Exception 
in F.p. Mask bits : bits (in 
operation (in FSR.tem) fp exception FSR.cexc) 
ieee 754 
of uf nx ofm ufm nxm | Trap Occurs? ofc ufc  nxc 
- - - x x x no 0 0 0 
- - v x x 0 no 0 0 1 
a v v x 0 0 no 0 1 1 
v? > vw 0 x 0 no 1 0 1 
- - v x x 1 yes 0 0 1 
2 v v x 0 1 yes 0 0 1 
- v - x 1 x yes 0 1 0 
E v v x 1 x yes 0 1 0 
v? - v? 1 x x yes 1 0 0 
v? g vw 0 x 1 yes 9: 0 1 





Notes: ! When the underflow trap is disabled (FSR.tem.ufm = 0) 
underflow is always accompanied by inexact. 
? Overflow is always accompanied by inexact. 


If the execution of an FPop causes a trap other than fp exception ieee 754, 
FSR.cexc is left unchanged. 


Floating-Point Exception Fields 


The current and accrued exception fields and the trap enable mask assume the 
following definitions of the floating-point exception conditions (per IEEE Std 754- 
1985): 
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RW. RW. RW RW RW 
FSR.tem 
27 26 25 24 23 


FIGURE 5-6 Trap Enable Mask (tem) Fields of FSR 


RW RW RW RW RW 
FSRaexc 
9 8 7 6 9 


FIGURE 5-7 Accrued Exception Bits (aexc) Fields of FSR 


RW RW RW RW RW 
FSR.cexc 
4 3 2 1 0 
FIGURE 5-8 Current Exception Bits (aexc) Fields of FSR 


Invalid (nvc, nva). An operand is improper for the operation to be performed. 
For example, 0.0 + 0.0 and co — œ are invalid; 1 = invalid operand(s), 0 = valid 
operand(s). 


Overflow (ofc, ofa). The result, rounded as if the exponent range were 
unbounded, would be larger in magnitude than the destination format’s largest 
finite number; 1 = overflow, 0 = no overflow. 


Underflow (ufc, ufa). The rounded result is inexact and would be smaller in 
magnitude than the smallest normalized number in the indicated format; 
1 = underflow, 0 = no underflow. 


Underflow is never indicated when the correct unrounded result is 0. 
Otherwise, when the correct unrounded result is not 0: 


If FSR.tem.ufm = 0: Underflow occurs if a nonzero result is tiny and a loss of 
accuracy occurs. 


If FSR.tem.ufm = 1: Underflow occurs if a nonzero result is tiny. 


The SPARC V9 architecture allows tininess to be detected either before or after 
rounding. However, in all cases and regardless of the setting of FSR.tem.ufm, an 
UltraSPARC Architecture strand detects tininess before rounding (impl. dep. #55-V8- 
Cs10). See Trapped Underflow Definition (ufm = 1) on page 367 and Untrapped 
Underflow Definition (ufm = 0) on page 367 for additional details. 


Division by zero (dzc, dza). An infinite result is produced exactly from finite 
operands. For example, X + 0.0, where X is subnormal or normalized; 1 = division 
by zero, 0 = no division by zero. 
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Inexact (nxc, nxa). The rounded result of an operation differs from the infinitely 
precise unrounded result; 1 = inexact result, 0 = exact result. 


FSR Conformance 


An UltraSPARC Architecture implementation implements the tem, cexc, and aexc 
fields of FSR in hardware, conforming to IEEE Std 754-1985 (impl. dep. #22-V8). 


Programming | Privileged software (or a combination of privileged and 

Note | nonprivileged software) must be capable of simulating the 
operation of the FPU in order to handle the fp exception other 
(with FSR.ftt = unfinished_FPop or unimplemented_FPop) and 
IEEE 754 exception floating-point trap types properly. Thus, a 
user application program always sees an FSR that is fully 
compliant with IEEE Std 754-1985. 








5.9 


Ancillary State Registers 


The SPARC V9 architecture defines several optional ancillary state registers (ASRs) 
and allows for additional ones. Access to a particular ASR may be privileged or 
nonprivileged. 


An ASR is read and written with the Read State Register and Write State Register 
instructions, respectively. These instructions are privileged if the accessed register is 
privileged. 


The SPARC V9 architecture left ASRs numbered 16-31 available for implementation- 
dependent uses. UltraSPARC Architecture virtual processors implement the ASRs 
summarized in TABLE 5-9 and defined in the following subsections. 


Each virtual processor contains its own set of ASRs; ASRs are not shared among 
virtual processors. 





TABLE 5-9 ASR Register Summary 
ASR Read by Written by 
number ASR name Register Instruction(s) Instruction(s) 
0 YP Y register (deprecated) RDYP WRYP 
1 — Reserved — — 
2. CCR Condition Codes register RDCCR WRCCR 
3 ASI ASI register RDASI WRASI 
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TABLE 5-9 ASR Register Summary (Continued) 
ASR Read by Written by 
number ASR name Register Instruction(s) Instruction(s) 

4 TICK? apt TICK register RDTICK?**t, WRPR? (TICK) 
RDPRF (TICK) 

5 PC Program Counter (PC) RDPC (all instructions) 

6  FPRS Floating-Point Registers Status register RDFPRS WRFPRS 

7-14 — Reserved — — 
15 — Reserved — — 
16-31 non-SPARC V9 ASRs — — 
16 PCRP Performance Control registers (PCR) RDPCR? WRPCRP 
17  PICP Performance Instrumentation Counters RDPICP?c WRPICPric 
(PIC) 
18 — Implementation dependent (impl. dep. — — 
#8-V8-Cs20, 9-V8-Cs20) 

19 GSR General Status register (GSR) RDGSR, WRGSR, 
FALIGNDATA, BMASK, SIAM 
many VIS and 
floating-point 
instructions 

20 SOFTINT SET? (pseudo-register, for "Write 1s Set" to — WRSOFTINT. SETP 

SOFTINT register, ASR 22) 
21 SOFTINT. CLRP (pseudo-register, for "Write 1s Clear' to — WRSOFTINT. CLR? 
SOFTINT register, ASR 22) 

22. SOFTINT? per-virtual processor Soft Interrupt RDSOFTINT" WRSOFTINT? 

register 

23 TICK_CMPRP Tick Compare register RDTICK CMPR'  WRTICK_CMPR? 

24 STICK?» System Tick register RDSTICK?rrt 2 

25 STICK CMPRP System Tick Compare register RDSTICK CMPR'  WRSTICK CMPR" 

26-31 — Implementation dependent (impl. dep. — — 


#8-V8-Cs20, 9-V8-Cs20) 
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5.5.1 


5.5.2 


32-bit Multiply /Divide Register (Y) (ASR 0) & 


The Y register is deprecated; it is provided only for compatibility with previous 
versions of the architecture. It should not be used in new SPARC V9 software. 
It is recommended that all instructions that reference the Y register (that is, 
SMUL, SMULcc, UMUL, UMULcc, MULScc, SDIV, SDIVcc, UDIV, UDIVcc, 


RDY, and WRY) be avoided. For suitable substitute instructions, see the 
following pages: for the multiply instructions, see pages 311 and page 356; for 
the multiply step instruction, see page 270; for division instructions, see pages 
304 and 354; for the read instruction, see page 288; and for the write 
instruction, see page 359. 





The low-order 32 bits of the Y register, illustrated in FIGURE 5-9, contain the more 
significant word of the 64-bit product of an integer multiplication, as a result of 
either a 32-bit integer multiply (SMUL, SMULcc, UMUL, UMULcc) instruction or an 
integer multiply step (MULScc) instruction. The Y register also holds the more 
significant word of the 64-bit dividend for a 32-bit integer divide (SDIV, SDIVcc, 
UDIV, UDIVcc) instruction. 


R RW 


32 81 0 
FIGURE 5-9 Y Register 


Although Y is a 64-bit register, its high-order 32 bits always read as 0. 


The Y register may be explicitly read and written by the RDY and WRY instructions, 
respectively. 


Integer Condition Codes Register (CCR) 
(ASR 2) 


The Condition Codes Register (CCR), shown in FIGURE 5-10, contains the integer 
condition codes. The CCR register may be explicitly read and written by the RDCCR 
and WRCCR instructions, respectively. 


RW RW 
T 4 3 0 


FIGURE 5-10 Condition Codes Register 
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5.5.2.1 Condition Codes (CCR.xcc and CCR. icc) 


All instructions that set integer condition codes set both the xcc and icc fields. The 
xcc condition codes indicate the result of an operation when viewed as a 64-bit 
operation. The icc condition codes indicate the result of an operation when viewed 
as a 32-bit operation. For example, if an operation results in the 64-bit value 

0000 0000 FFFF FFFF;4, the 32-bit result is negative (icc.n is set to 1) but the 64-bit 
result is nonnegative (XCC.n is set to 0). 


Each of the 4-bit condition-code fields is composed of four 1-bit subfields, as shown 
in FIGURE 5-11. 


RW RW RW RW 


XCC: 7 6 5 4 
icc: 3 2 1 0 


FIGURE 5-11 Integer Condition Codes (CCR.icc and CCR.xcc) 


The n bits indicate whether the two's-complement ALU result was negative for the 
last instruction that modified the integer condition codes; 1 = negative, 0 = not 
negative. 


The z bits indicate whether the ALU result was zero for the last instruction that 
modified the integer condition codes; 1 = zero, 0 = nonzero. 


The v bits signify whether the ALU result was within the range of (was 
representable in) 64-bit (xcc) or 32-bit (icc) two's complement notation for the last 
instruction that modified the integer condition codes; 1 = overflow, 0 = no overflow. 


The c bits indicate whether a 2's complement carry (or borrow) occurred during the 
last instruction that modified the integer condition codes. Carry is set on addition if 
there is a carry out of bit 63 (xcc) or bit 31 (icc). Carry is set on subtraction if there is 
a borrow into bit 63 (xcc) or bit 31 (icc); 1 = borrow, 0 = no borrow (see TABLE 5-10). 


TABLE 5-10 Setting of Carry (Borrow) bits for Subtraction That Sets CCs 


Unsigned Comparison of Operand Values Setting of Carry bits in CCR 





R[rs1]{31:0} > R[rs2](31:0! CCR.icc.c — 0 
R[rs1]{31:0} < R[rs2](31:0] CCR.icc.c + 1 
R[rs1]{63:0} > R[rs2]{63:0} CCR.xcc.c + 0 
R[rs1]{63:0} < R[rs2]{63:0} CCR.xcc.c — 1 


Both fields of CCR (xcc and icc) are modified by arithmetic and logical instructions, 
the names of which end with the letters "cc" (for example, ANDcc), and by the 
WRCCR instruction. They can be modified by a DONE or RETRY instruction, which 
replaces these bits with the contents of TSTATE.ccr. The behavior of the following 
instructions are conditioned by the contents of CCR.icc or CCR.xcc: 


m BPcc and Tcc instructions (conditional transfer of control) 
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m Bicc (conditional transfer of control, based on CCR.icc only) 
a MOVcc instruction (conditionally move the contents of an integer register) 


m FMOVcc instruction (conditionally move the contents of a floating-point register) 


Extended (64-bit) integer condition codes (xcc). Bits 7 through 4 are the IU 
condition codes, which indicate the results of an integer operation, with both of the 
operands and the result considered to be 64 bits wide. 


32-bit Integer condition codes (icc). Bits 3 through 0 are the IU condition codes, 
which indicate the results of an integer operation, with both of the operands and the 
result considered to be 32 bits wide. 


Address Space Identifier (ASI) Register 
(ASR 3) 


The Address Space Identifier register (FIGURE 5-12) specifies the address space 
identifier to be used for load and store alternate instructions that use the “rs1 + 
simm13” addressing form. 


The ASI register may be explicitly read and written by the RDASI and WRASI 
instructions, respectively. 


Software (executing in any privilege mode) may write any value into the ASI 
register. However, values in the range 0046 to 7F46 are "restricted" ASIs; an attempt 
to perform an access using an ASI in that range is restricted to software executing in 
a mode with sufficient privileges for the ASI. When an instruction executing in 
nonprivileged mode attempts an access using an ASI in the range 0046 to 7F46 or an 
instruction executing in privileged mode attempts an access using an ASI the range 
3046 to 7F16, a privileged action exception is generated. See Chapter 10, Address Space 
Identifiers (ASIs) for details. 


FIGURE 5-12 Address Space Identifier Register 


Tick (TICK) Register (ASR 4) 


FIGURE 5-13 illustrates the TICK register. 
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R R 
npt (D2) counter 
63 62 
FIGURE 5-13 TICK Register 


The counter field of the TICK register is a 63-bit counter that counts strand clock 
cycles. 


Bit 63 of the TICK register is the nonprivileged trap (npt) bit, which controls 
access to the TICK register by nonprivileged software. 


Privileged software can always read the TICK register, with either the RDPR or 
RDTICK instruction. 


Privileged software cannot write to the TICK register; an attempt to do so (with the 
WRPR instruction) results in an illegal instruction exception. 


Nonprivileged software can read the TICK register by using the RDTICK instruction, 
but only when nonprivileged access to TICK is enabled by hyperprivileged software. 
If nonprivileged access is disabled, an attempt by nonprivileged software to read the 
TICK register using the RDTICK instruction causes a privileged action exception. 


An attempt by nonprivileged software at any time to read the TICK register using 
the privileged RDPR instruction causes a privileged opcode exception. 


Nonprivileged software cannot write the TICK register. An attempt by nonprivileged 
software to write the TICK register using the privileged WRPR instruction causes a 
privileged opcode exception. 


The difference between the values read from the TICK register on two reads is 
intended to reflect the number of strand cycles executed between the reads. 


Programming | If a single TICK register is shared among multiple virtual 
Note | processors, then the difference between subsequent reads of 
TICK.counter reflects a shared cycle count, not a count specific to 
the virtual processor reading the TICK register. 


IMPL. DEP. #105-V9: (a) If an accurate count cannot always be returned when TICK 
is read, any inaccuracy should be small, bounded, and documented. 

(b) An implementation may implement fewer than 63 bits in TICK.counter; however, 
the counter as implemented must be able to count for at least 10 years without 
overflowing. Any upper bits not implemented must read as zero. 


Program Counters (PC, NPC) (ASR 5) 


The PC contains the address of the instruction currently being executed. The least- 
significant two bits of PC always contain zeroes. 
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The PC can be read directly with the RDPC instruction. PC cannot be explicitly 
written by any instruction (including Write State Register), but is implicitly written 
by control transfer instructions. A WRasr to ASR 5 causes an illegal instruction 
exception. 


The Next Program Counter, NPC, is a pseudo-register that contains the address of 
the next instruction to be executed if a trap does not occur. The least-significant two 
bits of NPC always contain zeroes. 


NPC is written implicitly by control transfer instructions. However, NPC cannot be 
read or written explicitly by any instruction. 


PC and NPC can be indirectly set by privileged software that writes to TPC[TL] 
and/or TNPC[TL] and executes a RETRY instruction. 


See Chapter 6, Instruction Set Overview, for details on how PC and NPC are used. 


Floating-Point Registers State (FPRS) Register 
(ASR 6) 


The Floating-Point Registers State (FPRS) register, shown in FIGURE 5-14, contains 
control information for the floating-point register file; this information is readable 
and writable by nonprivileged software. 


RW RW RW 


rens [erTo Ta] 


2 1 0 
FIGURE 5-14 Floating-Point Registers State Register 


The FPRS register may be explicitly read and written by the RDFPRS and WRFPRS 
instructions, respectively. 


Enable FPU (fef). Bit 2, fef, determines whether the FPU is enabled. If it is 
disabled, executing a floating-point instruction causes an fp disabled trap. If this bit 
is set (FPRS fef = 1) but the PSTATE.pef bit is not set (PSTATE.pef = 0), then 
executing a floating-point instruction causes an fp disabled exception; that is, both 
FPRS.fef and PSTATE.pef must be set to 1 to enable floating-point operations. 


Programming | FPRS.fef can be used by application software to notify system 

Note | software that the application does not require the contents of the 
F registers to be preserved. Depending on system software, this 
may provide some performance benefit, for example, the F 
registers would not have to be saved or restored during context 
switches to or from that application. Once an application sets 
FPRS.fef to 0, it must assume that the values in all F registers 
are volatile (may change at any time). 
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Dirty Upper Registers (du). Bit 1 is the "dirty" bit for the upper half of the 
floating-point registers; that is, F[32]- F[62]. It is set to 1 whenever any of the upper 
floating-point registers is modified. The du bit is cleared only by software. 


IMPL. DEP. #403-S10(a): An UltraSPARC Architecture 2005 virtual processor may 
set FPRS.du pessimistically; that is, it may be set whenever an FPop is issued, even 
though no destination F register is modified. The specific conditions under which a 
dirty bit is set pessimistically are implementation dependent. 


Dirty Lower Registers (dl). Bit 0 is the "dirty" bit for the lower 32 floating-point 
registers; that is, F[0]- F[31]. It is set to 1 whenever any of the lower floating-point 
registers is modified. The dl bit is cleared only by software. 


IMPL. DEP. #403-S10(b): An UltraSPARC Architecture 2005 virtual processor may 
set FPRS.dl pessimistically; that is, it may be set whenever an FPop is issued, even 
though no destination F register is modified. The specific conditions under which a 
dirty bit is set pessimistically are implementation dependent. 


Implementation | If an instruction that normally writes to the F registers is 

Note | executed and causes an fp disabled exception, an UltraSPARC 
Architecture 2005 implementation still sets the "dirty" bit 
(FPRS.du or FPRS.dl) corresponding to the destination register 
to ‘1’. 


Forward | It is expected that in future revisions to the UltraSPARC 
Compatibility | Architecture, if an instruction that normally writes to the F 
Note | registers is executed and causes an fp disabled exception the 
“dirty” bit (FPRS.du or FPRS.dl) corresponding to the 
destination register will be left unchanged. 





Performance Control Register (PCR!) (ASR 16) 


The PCR is used to control performance monitoring events collected in counter 
pairs, which are accessed via the Performance Instrumentation Counter (PIC) 
register (ASR 17) (see page 75). Unused PCR bits read as zero; they should be 
written only with zeroes or with values previously read from them. 


When the virtual processor is operating in privileged mode (PSTATE.priv = 1), PCR 
may be freely read and written by software. 


When the virtual processor is operating in nonprivileged mode (PSTATE.priv = 0), an 
attempt to access PCR (using a RDPCR or WRPCR instruction) results in a 
privileged opcode exception (impl. dep. #250-U3-Cs10). 


The PCR is illustrated in FIGURE 5-15 and described in TABLE 5-11. 
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FIGURE 5-15 Performance Control Register (PCR) (ASR 16) 


IMPL. DEP. #207-U3: The values and semantics of bits 47:32, 26:17, and bit 3 of the 
PCR are implementation dependent. 


TABLE 5-11 PCR Bit Description 








Bit Field Description 
47:32 = These bits are implementation dependent (impl. dep #207-U3). 
26:17 = These bits are implementation dependent (impl. dep. #207-U3). 
16:11 su Six-bit field selecting 1 of 64 event counts in the upper half (bits {63:32}) of the PIC. 
9:4 sl Six-bit field selecting 1 of 64 event counts in the lower half (bits {31:0}) of the PIC. 
3 = This bit is implementation dependent (impl. dep. #207-U3). 
2 ut User Trace Enable. If set to 1, events in nonprivileged (user) mode are counted. 

st System Trace Enable. If set to 1, events in privileged (system) mode are counted. 

Notes: 


If both PCR.ut and PCR.st are set to 1, all selected events are counted. 
If both PCR.ut and PCR.st are zero, counting is disabled. 
PCR.ut and PCR.st are global fields which apply to all PIC pairs. 


0 priv Privileged. Controls access to the PIC register (via RDPIC or WRPIC instructions). If 
PCR.priv = 0, an attempt to access PIC will succeed regardless of the privilege state 
(PSTATE.priv). If PCR.priv = 1, access to PIC is restricted to privileged software; that is, an 
attempt to access PIC while PSTATE.priv = 1 will succeed, but an attempt to access PIC while 
PSTATE.priv = 0 will result in a privileged action exception. 





5.5.8 Performance Instrumentation Counter (PIC) 
Register (ASR 17) 


PIC contains two 32-bit counters that count performance-related events (such as 
instruction counts, cache misses, TLB misses, and pipeline stalls). Which events are 
actively counted at any given time is selected by the PCR register. 


The difference between the values read from the PIC register at two different times 
reflects the number of events that occurred between register reads. Software can only 
rely on the difference in counts between two PIC reads to get an accurate count, not 
on the difference in counts between a PIC write and a PIC read. 


PIC is normally a nonprivileged-access, read/write register. However, if the priv bit 
of the PCR (ASR 16) is set, attempted access by nonprivileged (user) code causes a 
privileged action exception. 
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Multiple PICs may be implemented. Each is accessed through ASR 17, using an 
implementation-dependent PIC pair selection field in PCR (ASR 16) (impl. dep. 
#207-U3). Read/write access to the PIC will access the picu/picl counter pair selected 
by PCR. 


The PIC is described below and illustrated in FIGURE 5-16. 





Bit Field Description 

63:32 picu 32-bit counter representing the count of an event selected by the su field of the 
Performance Control Register (PCR) (ASR 16). 

31:0 picl 32-bit counter representing the count of an event selected by the sl field of the Performance 


Control Register (PCR) (ASR 16). 





RW RW 


Des 


5.5.9 


32 31 0 
FIGURE 5-16 Performance Instrumentation Counter (PIC) (ASR 17) 


Counter Overflow. On overflow, the effective counter wraps to 0, SOFTINT 
register bit 15 is set to 1, and an interrupt level 15 trap is generated if not masked by 
PSTATE.ie and PIL. The counter overflow trap is triggered on the transition from 
value FFFF FFFF:, to value 0. 


General Status Register (GSR) (ASR 19) 


The General Status Register! (GSR) is a nonprivileged read/write register that is 
implicitly referenced by many VIS instructions. The GSR can be read by the RDGSR 
instruction (see Read Ancillary State Register on page 287) and written by the WRGSR 
instruction (see Write Ancillary State Register on page 358). 


If the FPU is disabled (PSTATE.pef = 0 or FPRS.fef = 0), an attempt to access this 
register using an otherwise-valid RDGSR or WRGSR instruction causes an 
fp disabled trap. 


The GSR is illustrated in FIGURE 5-17 and described in TABLE 5-12. 


RW RW RW RW RW 


63 


32 31 28 27 26 2524 8 7 32 0 
FIGURE 5-17 General Status Register (GSR) (ASR 19) 


1- This register was (inaccurately) referred to as the "Graphics Status Register" in early UltraSPARC 
implementations 
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TABLE 5-12 GSR Bit Description 


Bit Field Description 
63:32 mask This 32-bit field specifies the mask used by the BSHUFFLE instruction. The field — 
contents are set by the BMASK instruction. 
31:28 — Reserved. 
27 im Interval Mode: If GSR.im = 0, rounding is performed according to FSR.rd; if 
GSR.im = 1, rounding is performed according to GSR.irnd. 
26:25 irnd IEEE Std 754-1985 rounding direction to use in Interval Mode (GSR.im - 1), as follows: 
irnd Round toward ... 
0 Nearest (even, if tie) 
1 0 
2 + 00 
3 — oo 
24:8 m Reserved. 
7:3 scale 5-bit shift count in the range 0-31, used by the FPACK instructions for formatting. 
2:0 align Least three significant bits of the address computed by the last-executed 


ALIGNADDRESS or ALIGNADDRESS_LITTLE instruction. 





5.5.10  SOFTINT! Register (ASRs 20 ©, 21 ©, 22 ©) 


Software uses the privileged, read/write SOFTINT register (ASR 22) to schedule 
interrupts (via interrupt level n exceptions). 


SOFTINT can be read with a RDSOFTINT instruction (see Read Ancillary State 
Register on page 287) and written with a WRSOFTINT, WRSOFTINT SET, or 
WRSOFTINT CLR instruction (see Write Ancillary State Register on page 358). An 
attempt to access to this register in nonprivileged mode causes a privileged opcode 
exception. 


Programming | To atomically modify the set of pending software interrupts, use 
Note | of the SOFTINT SET and SOFTINT. CLR ASRs is 
recommended. 


The SOFTINT register is illustrated in FIGURE 5-18 and described in TABLE 5-13. 


RW RW RW 
63 17 16 15 1 0 


FIGURE 5-18 SOFTINT Register (ASR 22) 
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TABLE 5-13 


Bit 


Field 


SOFTINT Bit Description 


Description 





16 


15:1 


sm 


int_level 


tm 


When the STICK_CMPR (ASR 25) register’s int_dis (interrupt disable) field is 0 (that is, 
System Tick Compare is enabled) and its stick_cmpr field matches the value in the 
STICK register, then SOFTINT.sm (“STICK match”) is set to 1 and a level 14 interrupt 
(interrupt level 14) is generated. See System Tick Compare ( STICK_CMPR? ) Register (ASR 
25) on page 81 for details. SOFTINT.sm can also be directly written to 1 by software. 


When SOFTINT.int_level{n—1} (SOFTINT{n}) is set to 1, an interrupt level n exception is 
generated. 


Notes: |A level-14 interrupt (interrupt level 14) can be triggered by 
SOFTINT.sm, SOFTINT.tm, or a write to SOFTINT.int_level{13} 
(SOFTINT(14]). 


A level-15 interrupt (interrupt level 15) can be triggered by a write to 
SOFTINT.int_level{14} (SOFTINT{15}), or possibly by other 
implementation-dependent mechanisms. 


An interrupt level n exception will only cause a trap if (PIL « 1) and 
(PSTATE.ie = 1). 


When the TICK CMPR (ASR 23) register's int dis (interrupt disable) field is 0 (that is, 
Tick Compare is enabled) and its tick cmpr field matches the value in the TICK register, 
then the tm ("TICK match") field in SOFTINT is set to 1 and a level-14 interrupt 
(interrupt level 14) is generated. See Tick Compare (TICK CMPHP) Register (ASR 23) on 
page 79 for details. SOFTINT.tm can also be directly written to 1 by software. 





Setting any of SOFTINT.sm, SOFTINT.int_level{13} (SOFTINT(14]), or SOFTINT.tm 
to 1 causes a level-14 interrupt (interrupt level 14). However, those three bits are 
independent; setting any one of them does not affect the other two. 


See Software Interrupt Register (SOFTINT) on page 456 for additional information 
regarding the SOFTINT register. 


5.5.10.1 SOFTINT SET? Pseudo-Register (ASR 20) 


A Write State register instruction to ASR 20 (WRSOFTINT_SET) atomically sets 

selected bits in the privileged SOFTINT Register (ASR 22) (see page 77). That is, bits 
16:0 of the write data are ored into SOFTINT; any ‘1’ bit in the write data causes the 
corresponding bit of SOFTINT to be set to 1. Bits 63:17 of the write data are ignored. 


Access to ASR 20 is privileged and write-only. There is no instruction to read this 
pseudo-register. An attempt to write to ASR 20 in non-privileged mode, using the 
WkRasr instruction, causes a privileged opcode exception. 


Programming | There is no actual "register" (machine state) corresponding to 
Note | ASR 20; it is just a programming interface to conveniently set 
selected bits to ‘1’ in the SOFTINT register, ASR 22. 
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FIGURE 5-19 illustrates the SOFTINT_SET pseudo-register. 
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FIGURE 5-19 SOFTINT SET Pseudo-Register (ASR 20) 


5.5.10.2 SOFTINT CLR? Pseudo-Register (ASR 21) 


A Write State register instruction to ASR 21 (WRSOFTINT CLR) atomically clears 
selected bits in the privileged SOFTINT register (ASR 22) (see page 77). That is, bits 
16:0 of the write data are inverted and anded into SOFTINT; any '1' bit in the write 
data causes the corresponding bit of SOFTINT to be set to 0. Bits 63:17 of the write 
data are ignored. 


Access to ASR 21 is privileged and write-only. There is no instruction to read this 
pseudo-register. An attempt to write to ASR 21 in non-privileged mode, using the 
WkRasr instruction, causes a privileged opcode exception. 


There is no actual "register" (machine state) corresponding to 
ASR 21; it is just a programming interface to conveniently clear 
(set to ‘0’) selected bits in the SOFTINT register, ASR 22. 


Programming 
Note 





FIGURE 5-20 illustrates the SOFTINT CLR pseudo-register. 
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FIGURE 5-20 SOFTINT_CLR Pseudo-Register (ASR 21)) 


5.5.11 Tick Compare (TICK_CMPRP) Register (ASR 
23) «2 


The privileged TICK CMPR register allows system software to cause a trap when 
the TICK register reaches a specified value. Nonprivileged accesses to this register 
cause a privileged opcode exception (see Exception and Interrupt Descriptions on page 
445). 


The TICK CMPR register is illustrated in FIGURE 5-21 and described in TABLE 5-14. 


RW RW 
63 62 0 


FIGURE 5-21 TICK CMPR Register 
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TABLE 5-14 TICK CMPR Register Description 





Bit Field Description 





63 int_dis Interrupt Disable. If int_dis = 0, TICK compare interrupts are enabled 
and if int_dis = 1, TICK compare interrupts are disabled. 


62:0 tick_cmpr Tick Compare Field. When this field exactly matches the value in 
TICK.counter and TICK CMPR.int dis = 0, SOFTINT.tm is set to 1. 
This has the effect of posting a level-14 interrupt to the virtual 
processor, which causes an interrupt level 14 trap when (PIL « 14) 
and (PSTATE.ie = 1). The level-14 interrupt handler must check 
SOFTINT{14}, SOFTINT{0} (tm), and SOFTINT(16] (sm) to determine 
the source of the level-14 interrupt. 


55.12 System Tick (STICK) Register (ASR 24) 


The System Tick (STICK) register provides a counter that is synchronized across a 
system, useful for timestamping. The counter field of the STICK register is a 63-bit 
counter that increments at a rate determined by a clock signal external to the 
processor. 


Bit 63 of the STICK register is the nonprivileged trap (npt) bit, which controls 
access to the TICK register by nonprivileged software. 


The STICK register is illustrated in FIGURE 5-22 and described below. 


R R 
STICKP”Pt npt (D2) counter 
63 62 
FIGURE 5-22 STICK Register 


Privileged software can always read the STICK register with the RDSTICK 
instruction. 


Privileged software cannot write the STICK register; an attempt to execute the 
WRSTICK instruction in privileged mode results in an illegal instruction exception. 


Nonprivileged software can read the STICK register by using the RDSTICK 
instruction, but only when nonprivileged access to STICK is enabled by 
hyperprivileged software. If nonprivileged access is disabled, an attempt by 
nonprivileged software to read the STICK register causes a privileged action 
exception. 


Nonprivileged software cannot write the STICK register; an attempt to execute the 
WRSTICK instruction in nonprivileged mode results in an illegal instruction 
exception. 
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IMPL. DEP. #442-S10: (a) If an accurate count cannot always be returned when 
STICK is read, any inaccuracy should be small, bounded, and documented. 

(b) An implementation may implement fewer than 63 bits in STICK.counter; 
however, the counter as implemented must be able to count for at least 10 years 
without overflowing. Any upper bits not implemented must read as zero. 


5.55.13 System Tick Compare (STICK CMPRP) Register 
(ASR 25) 


The privileged STICK CMPR register allows system software to cause a trap when 
the STICK register reaches a specified value. Nonprivileged accesses to this register 
cause a privileged opcode exception (see Exception and Interrupt Descriptions on page 


445). 
The System Tick Compare Register is illustrated in FIGURE 5-23 and described in 
TABLE 5-15. 

RW RW 
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FIGURE 5-23 STICK_CMPR Register 


TABLE 5-15 STICK CMPR Register Description 


Bit Field Description 
63 int_dis Interrupt Disable. If set to 1, STICK_CMPR interrupts are disabled. 


62:0 stick_cmpr System Tick Compare Field. When this field exactly matches 
STICK.counter and STICK_CMPR.int_dis = 0, SOFTINT.sm is set to 
1. This has the effect of posting a level-14 interrupt to the virtual 
processor, which causes an interrupt_level_14 trap when (PIL < 14) 
and (PSTATE.ie = 1). The level-14 interrupt handler must check 
SOFTINT{14}, SOFTINT(0] (tm), and SOFTINT{16} (sm) to 
determine the source of the level-14 interrupt. 








5.6 Register-Window PR State Registers 


The state of the register windows is determined by the contents of a set of privileged 
registers. These state registers can be read/written by privileged software using the 
RDPR/WRPR instructions. An attempt by nonprivileged software to execute a 
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RDPR or WRPR instruction causes a privileged_opcode exception. In addition, these 
registers are modified by instructions related to register windows and are used to 
generate traps that allow supervisor software to spill, fill, and clean register 
windows. 


IMPL. DEP. #126-V9-Ms10: Privileged registers CWP, CANSAVE, CANRESTORE, 
OTHERWIN, and CLEANWIN contain values in the range 0 to N_REG_WINDOWS — 1. 
An attempt to write a value greater than N_REG_WINDOWS — 1 to any of these 
registers causes an implementation-dependent value between 0 and 

N REG WINDOWS — 1 (inclusive) to be written to the register. Furthermore, an attempt 
to write a value greater than N REG WINDOWS — 2 violates the register window state 
definition in Register Window State Definition on page 85. 

Although the width of each of these five registers is architecturally 5 bits, the width 
is implementation dependent and shall be between [logo(N REG. WINDOWS) | and 5 
bits, inclusive. If fewer than 5 bits are implemented, the unimplemented upper bits 
shall read as 0 and writes to them shall have no effect. All five registers should have 
the same width. 

For UltraSPARC Architecture 2005 processors, N REG WINDOWS = 8. Therefore, each 
register window state register is implemented with 3 bits, the maximum value for 
CWP and CLEANWIN is 7, and the maximum value for CANSAVE, CANRESTORE, 
and OTHERWIN is 6. When these registers are written by the WRPR instruction, bits 
63:3 of the data written are ignored. 


For details of how the window-management registers are used, see Register Window 
Management Instructions on page 116. 


Programming | CANSAVE, CANRESTORE, OTHERWIN, and CLEANWIN must 

Note | never be set to a value greater than N REG WINDOWS — 2 on an 
UItraSPARC Architecture virtual processor. Setting any of these 
to a value greater than N REG WINDOWS — 2 violates the register 
window state definition in Register Window State Definition on 
page 85. Hardware is not required to enforce this restriction; it is 
up to system software to keep the window state consistent. 


Implementation | A write to any privileged register, including PR state registers, 
Note | may drain the CPU pipeline. 





Current Window Pointer (CWP?) Register (PR 9) 


The privileged CWP register, shown in FIGURE 5-24, is a counter that identifies the 
current window into the array of integer registers. See Register Window Management 
Instructions on page 116 and Chapter 12, Traps, for information on how hardware 
manipulates the CWP register. 
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RW RW 


owe 77] 


4 32 0 
FIGURE 5-24 Current Window Pointer Register 


5.6.2 Savable Windows (CANSAVET) Register (PR 10) 


The privileged CANSAVE register, shown in FIGURE 5-25, contains the number of 
register windows following CWP that are not in use and are, hence, available to be 
allocated by a SAVE instruction without generating a window spill exception. 


RW RW 
CANSAVEP 
4 32 0 


FIGURE 5-25 CANSAVE Register, Figure 5-24, page 88 


5.6.3 Restorable Windows (CANRESTORE?) Register 
(PR 11) 


The privileged CANRESTORE register, shown in FIGURE 5-26, contains the number of 
register windows preceding CWP that are in use by the current program and can be 
restored (by the RESTORE instruction) without generating a window fill exception. 


RW RW 


4 32 0 
FIGURE 5-26 CANRESTORE Register 


5.6.4 Clean Windows (CLEANWINP) Register (PR 12) 


The privileged CLEANWIN register, shown in FIGURE 5-27, contains the number of 
windows that can be used by the SAVE instruction without causing a clean window 
exception. 


RW RW 


4 32 0 
FIGURE 5-27 CLEANWIN Register 
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The CLEANWIN register counts the number of register windows that are “clean” 
with respect to the current program; that is, register windows that contain only 
zeroes, valid addresses, or valid data from that program. Registers in these windows 
need not be cleaned before they can be used. The count includes the register 
windows that can be restored (the value in the CANRESTORE register) and the 
register windows following CWP that can be used without cleaning. When a clean 
window is requested (by a SAVE instruction) and none is available, a clean_window 
exception occurs to cause the next window to be cleaned. 


Other Windows (OTHERWINP) Register (PR 13) 


The privileged OTHERWIN register, shown in FIGURE 5-28, contains the count of 
register windows that will be spilled/filled by a separate set of trap vectors based on 
the contents of WSTATE.other. If OTHERWIN is zero, register windows are spilled/ 
filled by use of trap vectors based on the contents of WSTATE.normal. 


The OTHERWIN register can be used to split the register windows among different 
address spaces and handle spill/fill traps efficiently by use of separate spill/fill 
vectors. 


RW RW 


4 32 0 
FIGURE 5-28 OTHERWIN Register 


Window State (WSTATE?) Register (PR 14) 


The privileged WSTATE register, shown in FIGURE 5-29, specifies bits that are inserted 
into TT[TL]{4:2} on traps caused by window spill and fill exceptions. These bits are 
used to select one of eight different window spill and fill handlers. If OTHERWIN = 0 
at the time a trap is taken because of a window spill or window fill exception, then 
the WSTATE.normal bits are inserted into TT[TL]. Otherwise, the WSTATE.other bits 
are inserted into TT[TL]. See Register Window State Definition, below, for details of the 
semantics of OTHERWIN. 


RW RW 
WSTATEP other 
5 3 2 0 


FIGURE 5-29 WSTATE Register 
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Register Window Management 


The state of the register windows is determined by the contents of the set of 
privileged registers described in Register-Window PR State Registers on page 81. 
Those registers are affected by the instructions described in Register Window 
Management Instructions on page 116. Privileged software can read/write these state 
registers directly by using RDPR/WRPR instructions. 


5.6.7.1 Register Window State Definition 


For the state of the register windows to be consistent, the following must always be 
true: 


CANSAVE + CANRESTORE + OTHERWIN = N REG WINDOWS — 2 


FIGURE 5-3 on page 51 shows how the register windows are partitioned to obtain the 
above equation. The partitions are as follows: 


m The current window plus the window that must not be used because it overlaps 
two other valid windows. In FIGURE 5-3, these are windows 0 and 5, respectively. 
They are always present and account for the “2” subtracted from N_REG_WINDOWS 
in the right-hand side of the above equation. 


m Windows that do not have valid contents and that can be used (through a SAVE 
instruction) without causing a spill trap. These windows (windows 1-4 in 
FIGURE 5-3) are counted in CANSAVE. 


m Windows that have valid contents for the current address space and that can be 
used (through the RESTORE instruction) without causing a fill trap. These 
windows (window 7 in FIGURE 5-3) are counted in CANRESTORE. 


m Windows that have valid contents for an address space other than the current 
address space. An attempt to use these windows through a SAVE (RESTORE) 
instruction results in a spill (fill) trap to a separate set of trap vectors, as discussed 
in the following subsection. These windows (window 6 in FIGURE 5-3) are counted 
in OTHERWIN. 


In addition, 
CLEANWIN 2 CANRESTORE 


since CLEANWIN is the sum of CANRESTORE and the number of clean windows 
following CWP. 


For the window-management features of the architecture described in this section to 
be used, the state of the register windows must be kept consistent at all times, except 
within the trap handlers for window spilling, filling, and cleaning. While window 
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traps are being handled, the state may be inconsistent. Window spill/fill trap 
handlers should be written so that a nested trap can be taken without destroying 
state. 


Programming | System software is responsible for keeping the state of the 
Note | register windows consistent at all times. Failure to do so will 
cause undefined behavior. For example, CANSAVE, 
CANRESTORE, and OTHERWIN must never be greater than or 
equal to N_REG_WINDOWS - 1. 


5.6.7.2 Register Window Traps 


Window traps are used to manage overflow and underflow conditions in the register 
windows, support clean windows, and implement the FLUSHW instruction. 


See Register Window Traps on page 450 for a detailed description of how fill, spill, and 
clean_window traps support register windowing. 





5.7 


5.7.1 


Non-Register-Window PR State 
Registers 


The registers described in this section are visible only to software running in 
privileged mode (that is, when PSTATE.priv = 1), and may be accessed with the 
WRPR and RDPR instructions. (An attempt to execute a WRPR or RDPR instruction 
in nonprivileged mode causes a privileged_opcode exception.) 


Each virtual processor provides a full set of these state registers. 
Implementation | A write to any privileged register, including PR state registers, 
Note | may drain the CPU pipeline. 





Trap Program Counter (TPC) Register (PR 0) 


The privileged Trap Program Counter register (TPC; FIGURE 5-30) contains the 
program counter (PC) from the previous trap level. There are MAXPTL instances of 
the TPC, but only one is accessible at any time. The current value in the TL register 
determines which instance of the TPC[TL] register is accessible. An attempt to read 
or write the TPC register when TL = 0 causes an illegal instruction exception. 


During normal operation, the value of TPC[n], where n is greater than the current 
trap level (n > TL), is undefined. 
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pc high62 (PC{63:2} from trap while TL = 0) 
pc_high62 (PC{63:2} from trap while TL = 1) 
pc high62 (PC{63:2} from trap while TL = 2) 


pc_high62 (PC{ 63:2} from trap while TL = MAXPTL - 1) 





FIGURE 5-30 Trap Program Counter Register Stack 


TABLE 5-16 lists the events that cause TPC to be read or written. 


TABLE 5-16 Events that involve TPC, when executing with TL = n. 





Event Effect 

Trap TPC[n +1] — PC 
RETRY instruction PC — TPC[n] 
RDPR (TPC) R[rd] — TPC[n] 
WRPR (TPC) TPC[n] — value 


5.7.2 Trap Next PC (TNPC!) Register (PR 1) 


The privileged Trap Next Program Counter register (TNPC; FIGURE 5-30) is the next 

program counter (NPC) from the previous trap level. There are MAXPTL instances of 
the TNPC, but only one is accessible at any time. The current value in the TL register 
determines which instance of the TNPC register is accessible. An attempt to read or 
write the TNPC register when TL = 0 causes an illegal instruction exception. 


R 
TNPC,? npc high62 (NPC{63:2} from trap while TL = 0) 00 | 





P 
TNPC; npc_high62 (NPC(63:2) from trap while TL = 1) [o0 | 
TNPC,? npc_high62 (NPC{63:2} from trap while TL = 2) 


63 210 
FIGURE 5-31 Trap Next Program Counter Register Stack 


During normal operation, the value of TNPC[], where n is greater than the current 
trap level (n > TL), is undefined. 


TABLE 5-17 lists the events that cause TNPC to be read or written. 
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TABLE 5-17 Events that involve TNPC, when executing with TL = n. 





Event Effect 

Trap TNPC[n + 1] — NPC 

DONE instruction PC €— TNPC[n]; NPC — TNPC[n] + 4 
RETRY instruction NPC — TNPO[n] 

RDPR (TNPC) R[rd] — TNPC[n] 

WRPR (TNPC) TNPC[n] < value 





5:79 


Trap State (TSTATEP) Register (PR 2) 


The privileged Trap State register (TSTATE; FIGURE 5-32) contains the state from the 
previous trap level, comprising the contents of the GL, CCR, ASI, CWP, and PSTATE 
registers from the previous trap level. There are MAXPTL instances of the TSTATE 
register, but only one is accessible at a time. The current value in the TL register 
determines which instance of TSTATE is accessible. An attempt to read or write the 


TSTATE register when TL = 0 causes an illegal instruction exception. 







































RW RW RW R RW R RW 
TSTATE," gl ccl asi — pstate — cwp 
(GL from TL = 0) |(CCR from TL = 0)| (ASI from TL = 0) (PSTATE from TL = 0) (CWP from TL = 0) 
TSTATE, "| gl ccl asi — pstate — cwp 
(GL from TL = 1) |(CCR from TL = 1)| (ASI from TL = 1 (PSTATE from TL = 1) (CWP from TL = 1) 
TSTATE,"| gl ccr asi — pstate — cwp 
È (GL from TL = 2) |(CCR from TL = 2)| (ASI from TL =2 (PSTATE from TL =2) (CWP from TL = 2) 
gl ccr asi — pstate — cwp 
TSTATE,4Axpn!| (GL from (CCR from (ASI from (PSTATE from (CWP from 
TL = MAXPTL — D)|TL = MAXPTL — 1)/TL = MAXPTL- 1 TL = MAXPTL — 1) TL = MAXPTL — 1) 
gl ccr asi pstate cwp 
TSTATE yaxerc+1 | (GL from (CCR from (ASI from (PSTATE from (CWP from 
TL = MAXPTL) TL = MAXPTL) TL = MAXPTL) TL = MAXPTL) TL = MAXPTL) 
4 40 g 4 Ü 8 4 Ü 
TABLE 5-18 


FIGURE 5-32 Trap State (TSTATE) Register Stack 


During normal operation the value of TSTATE[n], when n is greater than the current 


trap level (n » TL), is undefined. 


V9 Compatibility | Because of the addition of additional bits in the PSTATE register 
Note | in the UltraSPARC Architecture, a 13-bit PSTATE value is stored 
in TSTATE instead of the 10-bit value specified in the SPARC V9 


architecture. 
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TABLE 5-19 lists the events that cause TSTATE to be read or written. 
TABLE 5-19 Events That Involve TSTATE, When Executing with TL =n 





Event Effect 

Trap TSTATE[n + 1] — (registers) 
DONE instruction (registers) — TSTATE[n] 
RETRY instruction (registers) — TSTATE[n] 
RDPR (TSTATE) R[rd] — TSTATE[n] 

WRPR (TSTATE) TSTATE[n] < value 





Trap Type (TT?) Register (PR 3) 


The privileged Trap Type register (TT; see FIGURE 5-33) contains the trap type of the 
trap that caused entry to the current trap level. There are MAXPTL instances of the TT 
register, but only one is accessible at a time. The current value in the TL register 
determines which instance of the TT register is accessible. An attempt to read or 
write the TT register when TL = 0 causes an illegal instruction exception. 





TT,’ Trap type from trap hile TL = 0 
iP : 
Then Trap type from trap while TL = MAXPTL — 1 





FIGURE 5-33 Trap Type Register Stack 


During normal operation, the value of TT[n], where n is greater than the current trap 
level (n > TL), is undefined. 


TABLE 5-20 lists the events that cause TT to be read or written. 


TABLE 5-20 Events that involve TT, when executing with TL = n. 





Event Effect 

Trap TT[n + 1] < (trap type) 
RDPR (TT) R[rd] — TT[n] 

WRPR (TT) TT[n] < value 
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Trap Base Address (TBAP) Register (PR 5) 


The privileged Trap Base Address register (TBA), shown in FIGURE 5-34, provides the 
upper 49 bits (bits 63:15) of the virtual address used to select the trap vector for a 
trap that is to be delivered to privileged mode. The lower 15 bits of the TBA always 
read as zero, and writes to them are ignored. 


RW R 


TBAP tba high49 000 0000 0000 0000 


5.7.6 


PSTATEP 


63 15 14 0 
FIGURE 5-34 Trap Base Address Register 


Details on how the full address for a trap vector is generated, using TBA and other 
state, are provided in Trap-Table Entry Address to Privileged Mode on page 433. 


Processor State (PSTATE?) Register (PR 6) 


The privileged Processor State register (PSTATE), shown in FIGURE 5-35, contains 
control fields for the current state of the virtual processor. There is only one instance 
of the PSTATE register per virtual processor. 


RW 


RW RW RW RW RW RW RW 
-I 0 [LIeIeISISTIÓ 
12 11 10 9 8 7 6 5 4 3 2 1 0 
FIGURE 5-35 PSTATE Field 


Writes to PSTATE are nondelayed; that is, new machine state written to PSTATE is 
visible to the next instruction executed. The privileged RDPR and WRPR 
instructions are used to read and write PSTATE, respectively. 


The following subsections describe the fields of the PSTATE register. 


Current Little Endian (cle). This bit affects the endianness of data accesses 
performed using an implicit ASI. When PSTATE.cle = 1, all data accesses using an 
implicit ASI are performed in little-endian byte order. When PSTATE.cle = 0, all data 
accesses using an implicit ASI are performed in big-endian byte order. Specific ASIs 
used are shown in TABLE 6-3 on page 108. Note that the endianness of a data access 
may be further affected by TTE.ie used by the MMU. 


Instruction accesses are unaffected by PSTATE.cle and are always performed in big- 
endian byte order. 


Trap Little Endian (tle). When a trap is taken, the current PSTATE register is 
pushed onto the trap stack. 
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During a virtual processor trap to privileged mode, the PSTATE.tle bit is copied into 
PSTATE.cle in the new PSTATE register. This behavior allows system software to 
have a different implicit byte ordering than the current process. Thus, if PSTATE.tle 
is set to 1, data accesses using an implicit ASI in the trap handler are little-endian. 


The original state of PSTATE.cle is restored when the original PSTATE register is 
restored from the trap stack. 


Memory Model (mm). This 2-bit field determines the memory model in use by 
the virtual processor. The defined values for an UltraSPARC Architecture virtual 
processor are listed in TABLE 5-21. 


TABLE 5-21 PSTATE.mm Encodings 





mm Value Selected Memory Model 
00 Total Store Order (TSO) 
01 Reserved 
10 Implementation dependent (impl. dep. #113-V9-Ms10) 
11 Implementation dependent (impl. dep. #113-V9-Ms10) 


The current memory model is determined by the value of PSTATE.mm. Software 
should refrain from writing the values 015, 105, or 11; to PSTATE.mm because they 
are implementation-dependent or reserved for future extensions to the architecture, 
and in any case not currently portable across implementations. 


m Total Store Order (TSO) — Loads are ordered with respect to earlier loads. Stores 
are ordered with respect to earlier loads and stores. Thus, loads can bypass earlier 
stores but cannot bypass earlier loads; stores cannot bypass earlier loads or stores. 


IMPL. DEP. #113-V9-Ms10: Whether memory models represented by 

PSTATE.mm = 105 or 115 are supported in an UltraSPARC Architecture processor is 
implementation dependent. If the 105 model is supported, then when 

PSTATE.mm = 10, the implementation must correctly execute software that adheres 
to the RMO model described in The SPARC Architecture Manual-Version 9. If the 11; 
model is supported, its definition is implementation dependent. 


IMPL. DEP. #119-Ms10: The effect of writing an unimplemented memory model 
designation into PSTATE.mm is implementation dependent. 
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SPARC V9 | The PSO memory model described in SPARC V8 and SPARC V9 
Compatibility | architecture specifications was never implemented in a SPARC 
Notes | V9 implementation and is not included in the UltraSPARC 
Architecture specification. 


The RMO memory model described in the SPARC V9 
specification was implemented in some non-Sun SPARC V9 
implementations, but is not directly supported in UltraSPARC 
Architecture 2005 implementations. All software written to run 
correctly under RMO will run correctly under TSO on an 
UltraSPARC Architecture 2005 implementation. 





Enable FPU (pef). When set to 1, the PSTATE.pef bit enables the floating-point 
unit. This allows privileged software to manage the FPU. For the FPU to be usable, 
both PSTATE.pef and FPRS.fef must be set to 1. Otherwise, any floating-point 
instruction that tries to reference the FPU causes an fp disabled trap. 


If an implementation does not contain a hardware FPU, PSTATE.pef always reads as 
0 and writes to it are ignored. 


Address Mask (am). The PSTATE.am bit is provided to allow 32-bit SPARC 
software to run correctly on a 64-bit SPARC V9 processor, by masking out (zeroing) 
bits 63:32 of virtual addresses at appropriate times. 


When PSTATE.am - 0, the full 64 bits of all instruction and data addresses are 
preserved at all times. 


When PSTATE.am - 1, bits 63:32 of instruction and data virtual addresses are 
masked out (treated as 0). 


Programming | It is the responsibility of privileged software to manage the 
Note | setting of the PSTATE.am bit, since hardware masks virtual 
addresses when PSTATE.am = 1. 


Misuse of the PSTATE.am bit can result in undesirable behavior. 
PSTATE.am should rot be set to 1 in privileged mode. 


The PSTATE.am bit should always be set to 1 when 32-bit 
software is executed. 





Instances in which the more-significant 32 bits of a virtual address are masked 
include: 


m Before any data address is sent out of the virtual processor (notably, to the 
memory system, which includes MMU, internal caches, and external caches). 


m Before any instruction address is sent out of the virtual processor (notably, to the 
memory system, which includes MMU, internal caches, and external caches) 
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m When the value of PC is stored to a general-purpose register by a CALL, JMPL, or 
RDPC instruction (closed impl.dep. #125-V9-Cs10) 


m When the values of PC and NPC are written to TPC[TL] and TNPC[TL] 
(respectively) during a trap (closed impl.dep. #125-V9-Cs10) 


m Before any virtual address is sent to a watchpoint comparator 


Programming | À 64-bit comparison is always used when performing a masked 
Note | watchpoint address comparison with the Instruction or Data VA 
watchpoint register. When PSTATE.am = 1, the more significant 
32 bits of the VA watchpoint register must be zero for a match 
(and resulting trap) to occur. 


When PSTATE.am = 1, the more-significant 32 bits of a virtual address are explicitly 
preserved and not masked out in the following cases: 


m When a target address is written to NPC by a control transfer instruction 


Forward | This behavior is expected to change in the next revision of the 
Compatibility | architecture, such that implementations will explicitly mask out 
Note | (not preserve) the more-significant 32 bits, in this case. 


m When NPC is incremented to NPC + 4 during execution of an instruction that is 
not a taken control transfer 


Forward | This behavior is expected to change in the next revision of the 
Compatibility | architecture, such that implementations will explicitly mask out 
Note | (not preserve) the more-significant 32 bits, in this case. 


m When a WRPR instruction writes to TPC[TL] or TNPC[TL] 


Programming | Since writes to PSTATE are nondelayed (see page 90), a change 

Note | to PSTATE.am can affect which instruction is executed 
immediately after the write to PSTATE.am. Specifically, if a 
WRPR to the PSTATE register changes the value of PSTATE.am 
from '0' to ‘1’, and NPC {63:32} when the WRPR began execution 
was nonzero, then the next instruction executed after the WRPR 
will be from the address indicated in NPC{31:0} (with the more- 
significant 32 address bits set to zero). 


m When a RDPR instruction reads from TPC[TL] or TNPC[TL] 





If (1) TSTATE[TL].pstate.am = 1 and (2) a DONE or RETRY instruction is executed!, 
it is implementation dependent whether the DONE or RETRY instruction masks 
(zeroes) the more-significant 32 bits of the values it places into PC and NPC (impl. 
dep. #417-S10). 


1- which sets PSTATE.am to ‘1’, by restoring the value from TSTATE[TL].pstate.am to PSTATE.am 
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Programming | Because of implementation dependency 3417-510, great care 
Note | must be taken in trap handler software if 
TSTATE[TL].pstate.am = 1 and the trap handler wishes to write 
a nonzero value to the more-significant 32 bits of TPC[TL] or 
TNPC[TL]. 


Privileged Mode (priv). When PSTATE.priv = 1, the virtual processor is operating 
in privileged mode. 


When PSTATE.priv = 0, the processor is operating in nonprivileged mode 


PSTATE interrupt enable (ie). | PSTATE.ie controls when the virtual processor 
can take traps due to disrupting exceptions (such as interrupts or errors unrelated to 
instruction processing). 


Outstanding disrupting exceptions that are destined for privileged mode can only 
cause a trap when the virtual processor is in nonprivileged or privileged mode and 
PSTATE.ie = 1. At all other times, they are held pending. For more details, see 
Conditioning of Disrupting Traps on page 429. 


SPARC V9 | Since the UltraSPARC Architecture provides a more general 
Compatibility | “alternate globals” facility (through use of the GL register) than 
Note | does SPARC V9, an UltraSPARC Architecture processor does not 
implement the SPARC V9 PSTATE.ag bit. 


Trap Level Register (TL?) (PR 7) 


The privileged Trap Level register (TL; FIGURE 5-36) specifies the current trap level. 
TL = 0 is the normal (nontrap) level of operation. TL > 0 implies that one or more 
traps are being processed. 


FIGURE 5-36 Trap Level Register 


The maximum valid value that the TL register may contain is MAXPTL, which is 
always equal to the number of supported trap levels beyond level 0. 


IMPL. DEP. #101-V9-CS10: The architectural parameter MAXPTL is a constant for 
each implementation; its legal values are from 2 to 6 (supporting from 2 to 6 levels of 
saved trap state). In a typical implementation MAXPTL = MAXPGL (see impl. dep. #401- 
S10). Architecturally, MAXPTL must be > 2. 
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In an UltraSPARC Architecture 2005 implementation, MAXPTL = 2. See Chapter 12, 
Traps, for more details regarding the TL register. 


The effect of writing to TL with a WRPR instruction is summarized in TABLE 5-22. 


TABLE 5-22 Effect of WRPR of Value x to Register TL 





Privilege Level when Executing WRPR 














Value x Written with WRPR Nonprivileged Privileged 
x < MAXPTL 2 
privileged opcode 
x > MAXPTL exception TL © MAXPTL 


(no exception generated) 





Writing the TL register with a WRPR instruction does not alter any other machine 
state; that is, it is not equivalent to taking a trap or returning from a trap. 


Programming 
Note 


Implementation 
Note 


Programming 
Note 





An UItraSPARC Architecture implementation only needs to 
implement sufficient bits in the TL register to encode the 
maximum trap level value. In an implementation 

whereMAXPTL < 3, bits 63:2 of data written to the TL register 
using the WRPR instruction are ignored; only the least- 
significant two bits (bits 1:0) of TL are actually written. For 
example, if MAXPTL = 2, writing a value of 0546 to the TL register 
causes a value of 146 to actually be stored in TL. 


MAXPTL =2 for all UltraSPARC Architecture 2005 processors. 
Writing a value between 3 and 7 to the TL register in privileged 
mode causes a 2 to be stored in TL. 


Although it is possible for privileged software to set TL > 0 for 
nonprivileged software’, an UltraSPARC Architecture virtual 
processor’s behavior when executing with TL > 0 in 
nonprivileged mode is undefined. 


T by executing a WRPR to TSTATE followed by DONE instruction or RETRY 
instruction. 


5.7.8 Processor Interrupt Level (PILP) Register (PR 8) 


The privileged Processor Interrupt Level register (PIL; see FIGURE 5-37) specifies the 
interrupt level above which the virtual processor will accept an interrupt_level_n 
interrupt. Interrupt priorities are mapped so that interrupt level 2 has greater 
priority than interrupt level 1, and so on. See TABLE 12-4 on page 436 for a list of 
exception and interrupt priorities. 
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RW 


3 0 
FIGURE 5-37 Processor Interrupt Level Register 


V9 Compatibility | On SPARC V8 processors, the level 15 interrupt is considered to 
Note | be nonmaskable, so it has different semantics from other 
interrupt levels. SPARC V9 processors do not treat a level 15 
interrupt differently from other interrupt levels. 


5.7.9 Global Level Register (GLP) (PR 16) 


The privileged Global Level (GL) register selects which set of global registers is 
visible at any given time. 


FIGURE 5-38 illustrates the Global Level register. 


FIGURE 5-38 Global Level Register, GL 


When a trap occurs, GL is stored in TSTATE[TL].gl, GL is incremented, and a new set 
of global registers (R[1] through R[7]) becomes visible. A DONE or RETRY 
instruction restores the value of GL from TSTATE[TL]. 


The valid range of values that the GL register may contain is 0 to MAXPGL, where 
MAXPGL is one fewer than the number of global register sets available to the virtual 
processor. 


IMPL. DEP. #401-S10: The architectural parameter MAXPGL is a constant for each 
implementation; its legal values are from 2 to 7 (supporting from 3 to 8 sets of global 
registers). In a typical implementation MAXPGL = MAXPTL (see impl. dep. #101-V9- 
CS10). Architecturally, MAXPGL must be > 2. 


In all UltraSPARC Architecture 2005 implementations, MAXPGL = 2. (impl. dep. #401- 
S10). 


IMPL. DEP. #400-S10: Although GL is defined as a 3-bit register, an implementation 
may implement any subset of those bits sufficient to encode the values from 0 to 
MAXPGL for that implementation. If any bits of GL are not implemented, they read as 
zero and writes to them are ignored. 
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GL operates similarly to TL, in that it increments during entry to a trap, but the 
values of GL and TL are independent. That is, TL = n does not imply that GL = n, 
and GL = n does not imply that TL = n. Furthermore, there may be a different total 
number of global levels (register sets) than there are trap levels; that is, MAXPTL and 
MAXPGL are not necessarily equal. 


The GL register can be accessed directly with the RDPR and WRPR instructions (as 
privileged register number 16). Writing the GL register directly with WRPR will 
change the set of global registers visible to all instructions subsequent to the WRPR. 


In privileged mode, attempting to write a value greater than MAXPGL to the GL 
register causes MAXPGL to be written to GL. 


The effect of writing to GL with a WRPR instruction is summarized in TABLE 5-23. 


TABLE 5-23 Effect of WRPR to Register GL 


Privilege Level when WRPR Is Executed 

















Value x Written with WRPR Nonprivileged Privileged 
x < MAXPGL GL — x 
x » MAXPGL n 
privileged opcode 
exceptian GL — MAXPGL 








(no exception generated) 
Since TSTATE itself is software-accessible, it is possible that when a DONE or 
RETRY is executed to return from a trap handler, the value of GL restored from 
TSTATE[TL] will be different from that which was saved into TSTATE[TL] when the 
trap occurred. 
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CHAPTER 6 


Instruction Set Overview 





Instructions are fetched by the virtual processor from memory and are executed, 
annulled, or trapped. Instructions are encoded in 4 major formats and partitioned 
into 11 general categories. Instructions are described in the following sections: 


m Instruction Execution on page 99. 


m Instruction Formats on page 100. 
m Instruction Categories on page 101. 





6.1 


Instruction Execution 


The instruction at the memory location specified by the program counter is fetched 
and then executed. Instruction execution may change program-visible virtual 
processor and/or memory state. As a side effect of its execution, new values are 
assigned to the program counter (PC) and the next program counter (NPC). 


An instruction may generate an exception if it encounters some condition that makes 
it impossible to complete normal execution. Such an exception may in turn generate 
a precise trap. Other events may also cause traps: an exception caused by a previous 
instruction (a deferred trap), an interrupt or asynchronous error (a disrupting trap), 
or a reset request (a reset trap). If a trap occurs, control is vectored into a trap table. 
See Chapter 12, Traps, for a detailed description of exception and trap processing. 


If a trap does not occur and the instruction is not a control transfer, the next program 
counter is copied into the PC, and the NPC is incremented by 4 (ignoring arithmetic 
overflow if any). There are two types of control-transfer instructions (CTIs): delayed 
and immediate. For a delayed CTI, at the end of the execution of the instruction, 
NPC is copied into the PC and the target address is copied into NPC. For an 
immediate CTI, at the end of execution, the target is copied to PC and target + 4 is 
copied to NPC. In the SPARC instruction set, many CTIs do not transfer control until 
after a delay of one instruction, hence the term "delayed CTI” (DCTI). Thus, the two 
program counters provide for a delayed-branch execution model. 
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For each instruction access and each normal data access, an 8-bit address space 
identifier (ASI) is appended to the 64-bit memory address. Load/store alternate 
instructions (see Address Space Identifiers (ASIs) on page 108) can provide an arbitrary 
ASI with their data addresses or can use the ASI value currently contained in the 
ASI register. 





6.2 Instruction Formats 


Every instruction is encoded in a single 32-bit word. Their most typical 32-bit 
formats formats are shown in FIGURE 6-1. For detailed formats for specific 
instructions, see individual instruction descriptions in the Instructions chapter. 


op = 005: SETHI, Branches, and ILLTRAP 


foo} w | op2 imm22 
[eo Lee Le = 
Le Le el ae 
o ao roe T oe peo 





31 30 2928 27 25 24 22 2120 19 18 14 13 0 
op = 01: CALL 

disp30 

31 30 29 0 


Op = 105 or 115: Arithmetic, Logical, Moves, Tcc, Loads, Stores, Prefetch, and Misc 





imm_asi 
31 30 29 25 24 19 18 14 13 12 5 4 0 


FIGURE 6-1 Summary of Instruction Formats 
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6.3 


6.3.1 


Instruction Categories 


UltraSPARC Architecture instructions can be grouped into the following categories: 


Memory access 

Memory synchronization 
Integer arithmetic 

Control transfer (CTI) 
Conditional moves 
Register window management 
State register access 
Privileged register access 
Floating-point operate 
Implementation dependent 
Reserved 


These categories are described in the following subsections. 


Memory Access Instructions 


Load, store, load-store, and PREFETCH instructions are the only instructions that 
access memory. All of the memory access instructions except CASA, CASXA, and 
Partial Store use either two R registers or an R register and simm13 to calculate a 64- 
bit byte memory address. For example, Compare and Swap uses a single R register 
to specify a 64-bit byte memory address. To this 64-bit address, an ASI is appended 
that encodes address space information. 


The destination field of a memory reference instruction specifies the R or F 
register(s) that supply the data for a store or that receive the data from a load or 
LDSTUB. For SWAP, the destination register identifies the R register to be 
exchanged atomically with the calculated memory location. For Compare and Swap, 
an R register is specified, the value of which is compared with the value in memory 
at the computed address. If the values are equal, then the destination field specifies 
the R register that is to be exchanged atomically with the addressed memory 
location. If the values are unequal, then the destination field specifies the R register 
that is to receive the value at the addressed memory location; in this case, the 
addressed memory location remains unchanged. LDFSR/LDXFSR and STFSR/ 
STXFSR are special load and store instructions that load or store the floating-point 
status register, FSR, instead of acting on an R or F register. 


The destination field of a PREFETCH instruction (fcn) is used to encode the type of 
the prefetch. 
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Memory is byte (8-bit) addressable. Integer load and store instructions support byte, 
halfword (2 bytes), word (4 bytes), and doubleword/extended-word (8 bytes) 
accesses. Floating-point load and store instructions support word, doubleword, and 
quadword memory accesses. LDSTUB accesses bytes, SWAP accesses words, CASA 
accesses words, and CASXA accesses doublewords. The LDTXA (load twin- 
extended-word) instruction accesses a quadword (16 bytes) in memory. Block loads 
and stores access 64-byte aligned data. PREFETCH accesses at least 64 bytes. 


Programming | For some instructions, by use of simm13, any location in the 
Note | lowest or highest 4 Kbytes of an address space can be accessed 
without the use of a register to hold part of the address. 


6.3.1.1 Memory Alignment Restrictions 


A halfword access must be aligned on a 2-byte boundary, a word access (including 
an instruction fetch) must be aligned on a 4-byte boundary, an extended-word (LDX, 
LDXA, STX, STXA) or integer twin word (LDTW, LDTWA, STTW, STTWA ) access 
must be aligned on an 8-byte boundary,an integer twin-extended-word (LDTXA) 
access must be aligned on a 16-byte boundary, and a Block Load (LDBLOCKF) or 
Store (STBLOCKF) access must be aligned on a 64-byte boundary. 


A floating-point doubleword access (LDDF, LDDFA, STDF, STDFA) should be 
aligned on an 8-byte boundary, but is only required to be aligned on a word (4-byte) 
boundary. A floating-point doubleword access to an address that is 4-byte aligned 
but not 8-byte aligned may result in less efficient and nonatomic access (causes a 
trap and is emulated in software (impl. dep. #109-V9-Cs10)), so 8-byte alignment is 
recommended. 


A floating-point quadword access (LDOF, LDQFA, STOF, STQFA) should be aligned 
on a 16-byte boundary, but is only required to be aligned on a word (4-byte) 
boundary. A floating-point quadword access to an address that is 4-byte or 8-byte 
aligned but not 16-byte aligned may result in less efficient and nonatomic access 
(causes a trap and is emulated in software (impl. dep. #111-V9-Cs10)), so 16-byte 
alignment is recommended. 


An improperly aligned address in a load, store, or load-store instruction causes a 

mem_address_not_aligned exception to occur, with these exceptions: 

a An LDDF or LDDFA instruction accessing an address that is word aligned but not 
doubleword aligned may cause an LDDF. mem adaress not aligned exception 
(impl. dep. #109-V9-Cs10). 

m AnSTDF or STDFA instruction accessing an address that is word aligned but not 
doubleword aligned may cause an STDF mem adaress not aligned exception 
(impl. dep. #110-V9-Cs10). 
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m An LDQF or LDQFA instruction accessing an address that is word aligned but not 
quadword aligned may cause an LDQF mem adaress not aligned exception 
(impl. dep. #111-V9-Cs10a). 


Implementation | Although the architecture provides for the 
Note |DQF mem address not aligned exception, UltraSPARC 
Architecture 2005 implementations do not currently generate it. 


m AnSTOF or STOFA instruction accessing an address that is word aligned but not 
quadword aligned may cause an STQF mem adaress not aligned exception 
(impl. dep. #112-V9-Cs10a). 


Implementation | Although the architecture provides for the 
Note | STQF mem address not aligned exception, UltraSPARC 
Architecture 2005 implementations do not currently generate it. 


6.3.1.2 Addressing Conventions 


An UltraSPARC Architecture virtual processor uses big-endian byte order for all 
instruction accesses and, by default, for data accesses. It is possible to access data in 
little-endian format by use of selected ASIs. It is also possible to change the default 
byte order for implicit data accesses. See Processor State (PSTATE”) Register (PR 6) on 
page 90 for more information.! 


Big-endian Addressing Convention. Within a multiple-byte integer, the byte 
with the smallest address is the most significant; a byte's significance decreases as its 
address increases. The big-endian addressing conventions are described in TABLE 6-1 
and illustrated in FIGURE 6-2. 


TABLE6-1  Big-endian Addressing Conventions 


Term Definition 





byte A load/store byte instruction accesses the addressed byte in both big- and 
little-endian modes. 


halfword For a load/store halfword instruction, two bytes are accessed. The most 
significant byte (bits 15-8) is accessed at the address specified in the 
instruction; the least significant byte (bits 7-0) is accessed at the 
address + 1. 


1- Readers interested in more background information on big- vs. little-endian can also refer to Cohen, D., “On 
Holy Wars and a Plea for Peace," Computer 14:10 (October 1981), pp. 48-54. 
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TABLE6-1  Big-endian Addressing Conventions 


Term Definition 





word For a load/store word instruction, four bytes are accessed. The most 
significant byte (bits 31-24) is accessed at the address specified in the 
instruction; the least significant byte (bits 7-0) is accessed at the 
address + 3. 


doubleword or For a load/store extended or floating-point load/store double instruction, 

extended word eight bytes are accessed. The most significant byte (bits 63:56) is accessed 
at the address specified in the instruction; the least significant byte (bits 
7:0) is accessed at the address + 7. 
For the deprecated integer load/store twin word instructions (LDTW, 
LDTWAt, STTW, STTWA), two big-endian words are accessed. The word 
at the address specified in the instruction corresponds to the even register 
specified in the instruction; the word at address + 4 corresponds to the 
following odd-numbered register. 


Note that the LDTXA instruction, which is not an LDTWA operation but does share 
LDTWA's opcode, is not deprecated. 


quadword For a load/store quadword instruction, 16 bytes are accessed. The most 
significant byte (bits 127-120) is accessed at the address specified in the 
instruction; the least significant byte (bits 7-0) is accessed at the 
address + 15. 
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Byte 


Halfword 


Word 


Address 


Address [ 0 } = 


Address { 1:0] 


Doubleword / Address{2:0} 
Extended word 


Quadword 


Address [ 2:0] 


Address [ 3:0 ] 


Address [ 3:0 } 


Address [ 3:0 } 


Address { 3:0 } 


127 


95 


63 


31 


0000 


0100 


1000 


1100 


120 


88 


56 


24 


87 


55 





23 


0001 


0101 


1001 


1101 


112 


80 


48 


16 


FIGURE 6-2 Big-endian Addressing Conventions 
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0010 


0110 


1010 


1110 


104 


72 


40 





0011 


0111 


1011 


1111 


Little-endian Addressing Convention. Within a multiple-byte integer, the byte 
with the smallest address is the least significant; a byte’s significance increases as its 
address increases. The little-endian addressing conventions are defined in TABLE 6-2 
and illustrated in FIGURE 6-3. 


TABLE 6-2  Little-endian Addressing Convention 


Term Definition 


byte A load/store byte instruction accesses the addressed byte in both big- 
and little-endian modes. 


halfword For a load/store halfword instruction, two bytes are accessed. The least 
significant byte (bits 7-0) is accessed at the address specified in the 
instruction; the most significant byte (bits 15-8) is accessed at the 
address + 1. 


word For a load/store word instruction, four bytes are accessed. The least 
significant byte (bits 7-0) is accessed at the address specified in the 
instruction; the most significant byte (bits 31-24) is accessed at the 
address + 3. 


doubleword or For a load/store extended or floating-point load /store double 

extended word instruction, eight bytes are accessed. The least significant byte (bits 7-0) 
is accessed at the address specified in the instruction; the most significant 
byte (bits 63-56) is accessed at the address + 7. 
For the deprecated integer load/store twin word instructions (LDTW, 
LDTWAT, STTW, STTWA), two little-endian words are accessed. The 
word at the address specified in the instruction corresponds to the even 
register in the instruction; the word at the address specified in the 
instruction +4 corresponds to the following odd-numbered register. With 
respect to little-endian memory, an LDTW/LDTWA (STTW/STTWA) 
instruction behaves as if it is composed of two 32-bit loads (stores), each 
of which is byte-swapped independently before being written into each 
destination register (memory word). 


*Note that the LDTXA instruction, which is not an LDTWA operation but does share 
LDTWA's opcode, is not deprecated. 


quadword For a load/store quadword instruction, 16 bytes are accessed. The least 
significant byte (bits 7-0) is accessed at the address specified in the 
instruction; the most significant byte (bits 127—120) is accessed at the 
address + 15. 
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Byte 
Address 


Halfword 
Address{0} = 


Word 
Address{1:0} = 


Doubleword / Address{2:0} = 
Extended word 


Address{2:0} = 


Quadword 
Address{3:0} = 


Address{3:0} = 


Address{3:0} = 


Address{3:0} = 


100 

39 

0000 
7 

0100 
39 

1000 
71 

1100 
103 


32 


64 


96 





47 


79 


111 


01 10 11 


001 010 011 


101 110 111 


40| 55 48| 63 

0001 0010 0011 
8| 23 16131 

0101 0110 0111 
40| 55 48| 63 

1001 1010 1011 


1101 1110 1111 
104| 119 112| 127 


FIGURE 6-3 Little-endian Addressing Conventions 
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24 


56 


6.3.1.3 Address Space Identifiers (ASIs) 


Alternate-space load, store, and load-store instructions specify an explicit ASI to use 
for their data access; when i = 0, the explicit ASI is provided in the instruction’s 
imm_asi field, and when i = 1, it is provided in the ASI register. 


Non-alternate-space load, store, and load-store instructions use an implicit ASI value 
that depends on the current trap level (TL) and the value of PSTATE.cle. Instruction 
fetches use an implicit ASI that depends only on the current trap level. The cases are 
enumerated in TABLE 6-3. 


TABLE 6-3 ASIs Used for Data Accesses and Instruction Fetches 
































Access Type TL PSTATE.cle ASI Used 
Instruction Fetch =0 any ASI_PRIMARY 

>0 any ASI NUCLEUS* 
Non-alternate-space -0 0 ASI PRIMARY 
Load, Store; or 1 ASI_PRIMARY_LITTLE 
Load-Store 

>0 0 ASI_NUCLEUS* 

1 ASI NUCLEUS LITTLE** 

Alternate-space Load, any any ASI explicitly specified in the instruction 
Store, or Load-Store (subject to privilege-level restrictions) 





*On some early SPARC V9 implementations, ASI PRIMARY may have been used for this case. 
**On some early SPARC V9 implementations, ASI, PRIMARY LITTLE may have been used for this case. 
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See also Memory Addressing and Alternate Address Spaces on page 381. 


ASIs 00:6-7F36 are restricted; only software with sufficient privilege is allowed to 
access them. An attempt to access a restricted ASI by insufficiently-privileged 
software results in a privileged_action exception (impl. dep #103-V9-Ms10(6)). ASIs 
8016 through FF;4 are unrestricted; software is allowed to access them regardless of 
the virtual processor’s privilege mode, as summarized in TABLE 6-4. 


TABLE 6-4 Allowed Accesses to ASIs 


Processor Mode 





Value Access Type (PSTATE.priv) Result of ASI Access 

0016-7F 16 Restricted Nonprivileged (0) privileged_action exception 
Privileged (1) Valid access 

8016-FF16 Unrestricted Nonprivileged (0) Valid access 
Privileged (1) Valid access 





IMPL. DEP. #29-V8: Some UltraSPARC Architecture 2005 ASIs are implementation 
dependent. See TABLE 10-1 on page 401 for details. 


V9 Compatibility | In SPARC V9, many ASIs were defined to be implementation 
Note | dependent. 


An UltraSPARC Architecture implementation decodes all 8 bits of ASI specifiers 
(impl. dep. #30-V8-Cu3). 


V9 Compatibility 
Note 


In SPARC V9, an implementation could choose to decode only a 
subset of the 8-bit ASI specifier. 





6.3.1.4 Separate Instruction Memory 


A SPARC V9 implementation may choose to access instruction and data through the 
same address space and use hardware to keep data and instruction memory 
consistent at all times. It may also choose to overload independent address spaces 
for data and instructions and allow them to become inconsistent when data writes 
are made to addresses shared with the instruction space. 


Programming | A SPARC V9 program containing self-modifying code should 
Note | use FLUSH instruction(s) after executing stores to modify 
instruction memory and before executing the modified 
instruction(s), to ensure the consistency of program execution. 
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6.3.2 


6.3.3 


Memory Synchronization Instructions 


Two forms of memory barrier (MEMBAR) instructions allow programs to manage 
the order and completion of memory references. Ordering MEMBARs induce a 
partial ordering between sets of loads and stores and future loads and stores. 
Sequencing MEMBARs exert explicit control over completion of loads and stores (or 
other instructions). Both barrier forms are encoded in a single instruction, with 
subfunctions bit-encoded in cmask and mmask fields. 


Integer Arithmetic and Logical Instructions 


The integer arithmetic and logical instructions generally compute a result that is a 
function of two source operands and either write the result in a third (destination) 
register R[rd] or discard it. The first source operand is R[rs1]. The second source 
operand depends on the i bit in the instruction; if i = 0, then the second operand is 
R[rs2]; if i= 1, then the second operand is the constant simm10, simm11, or simm13 
from the instruction itself, sign-extended to 64 bits. 


Note | The value of R[0] always reads as zero, and writes to it are 
ignored. 


6.8.8.1 Setting Condition Codes 


Most integer arithmetic instructions have two versions: one sets the integer 
condition codes (icc and xcc) as a side effect; the other does not affect the condition 
codes. A special comparison instruction for integer values is not needed since it is 
easily synthesized with the “subtract and set condition codes" (SUBcc) instruction. 
See Synthetic Instructions on page 502 for details. 


6.3.3.2 Shift Instructions 


Shift instructions shift an R register left or right by a constant or variable amount. 
None of the shift instructions change the condition codes. 


6.3.3.3 Set High 22 Bits of Low Word 


The "set high 22 bits of low word of an R register" instruction (SETHI) writes a 22- 
bit constant from the instruction into bits 31 through 10 of the destination register. It 
clears the low-order 10 bits and high-order 32 bits, and it does not affect the 
condition codes. Its primary use is to construct constants in registers. 
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6.3.4 


6.3.3.4 Integer Multiply/Divide 


The integer multiply instruction performs a 64 x 64 — 64-bit operation; the integer 
divide instructions perform 64 + 64 — 64-bit operations. For compatibility with 
SPARC V8 processors, 32 x 32 — 64-bit multiply instructions, 64 + 32 — 32-bit divide 
instructions, and the Multiply Step instruction are provided. Division by zero causes 
a division_by_zero exception. 


6.3.3.5 Tagged Add/Subtract 


The tagged add/subtract instructions assume tagged-format data, in which the tag is 
the two low-order bits of each operand. If either of the two operands has a nonzero 
tag or if 32-bit arithmetic overflow occurs, tag overflow is detected. If tag overflow 
occurs, then TADDcc and TSUBcc set the CCR.icc.v bit; if 64-bit arithmetic overflow 
occurs, then they set the CCR.xcc.v bit. 


The trapping versions (TADDccTV, TSUBccTV) of these instructions are deprecated. 
See Tagged Add on page 345 and Tagged Subtract on page 351 for details. 


Control-Transfer Instructions (CTIs) 


The basic control-transfer instruction types are as follows: 


Conditional branch (Bicc, BPcc, BPr, FBfcc, FBPfcc) 
Unconditional branch 

Call and link (CALL) 

Jump and link (MPL, RETURN) 

Return from trap (DONE, RETRY) 

Trap (Tec) 


A control-transfer instruction functions by changing the value of the next program 
counter (NPC) or by changing the value of both the program counter (PC) and the 
next program counter (NPC). When only NPC is changed, the effect of the transfer of 
control is delayed by one instruction. Most control transfers are of the delayed 
variety. The instruction following a delayed control-transfer instruction is said to be 
in the delay slot of the control-transfer instruction. 


Some control transfer instructions (branches) can optionally annul, that is, not 
execute, the instruction in the delay slot, based on the setting of an annul bit in the 
instruction. The effect of the annul bit depends upon whether the transfer is taken 
or not taken and whether the branch is conditional or unconditional. Annulled 
delay instructions neither affect the program-visible state, nor can they cause a trap. 


TABLE 6-5 defines the value of the program counter and the value of the next 
program counter after execution of each instruction. Conditional branches have two 
forms: branches that test a condition (including branch-on-register), represented in 
the table by Bcc, and branches that are unconditional, that is, always or never taken, 
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Programming | The annul bit increases the likelihood that a compiler can find a 

Note | useful instruction to fill the delay slot after a branch, thereby 
reducing the number of instructions executed by a program. For 
example, the annul bit can be used to move an instruction from 
within a loop to fill the delay slot of the branch that closes the 
loop. 


Likewise, the annul bit can be used to move an instruction from 
either the “else” or “then” branch of an “if-then-else” program 
block to the delay slot of the branch that selects between them. 
Since a full set of conditions is provided, a compiler can arrange 
the code (possibly reversing the sense of the condition) so that 
an instruction from either the “else” branch or the “then” branch 
can be moved to the delay slot. Use of annulled branches 
provided some benefit in older, single-issue SPARC 
implementations. On an UltraSPARC Architecture 
implementation, the only benefit of annulled branches might be 
a slight reduction in code size. Therefore, the use of annulled 
branch instructions is no longer encouraged. 





represented in the table by BA and BN, respectively. The effect of an annulled branch 
is shown in the table through explicit transfers of control, rather than by fetching 
and annulling the instruction. 


TABLE 6-5 Control-Transfer Characteristics 











Instruction Group Address Form Delayed Taken Annul Bit New PC New NPC 
Non-CTIs — NPC NPC +4 
Bcc PC-relative Yes Yes 0 NPC EA 

Bcc PC-relative Yes No 0 NPC NPC +4 
Bcc PC-relative Yes Yes 1 NPC EA 

Bcc PC-relative Yes No 1 NPC «4 NPC +8 
BA PC-relative Yes Yes 0 NPG EA 

BA PC-relative No Yes 1 EA EA +4 
BN PC-relative Yes No 0 NPC NPC +4 
BN PC-relative Yes No 1 NPC +4 NPC +8 
CALL PC-relative Yes — — NPC EA 

JMPL, RETURN Register-indirect Yes = = NPC EA 
DONE Trap state No — — TNPC[TL] TNPC[TL] + 4 
RETRY Trap state No — — TPC[TL] TNPC[TL] 
Tec Trap vector No Yes — EA EA +4 
Tec Trap vector No No — NPC NPC +4 
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The effective address, “EA” in TABLE 6-5, specifies the target of the control-transfer 
instruction. The effective address is computed in different ways, depending on the 
particular instruction. 


m PC-relative effective address — A PC-relative effective address is computed by 
sign extending the instruction's immediate field to 64-bits, left-shifting the word 
displacement by 2 bits to create a byte displacement, and adding the result to the 
contents of the PC. 


m Register-indirect effective address — A register-indirect effective address 
computes its target address as either R[rs1]  R[rs2] if i = 0, or 
R[rs1] + sign ext(simm13) if i= 1. 

m Trap vector effective address — A trap vector effective address first computes the 
software trap number as the least significant 7 or 8 bits of R[rs1]  R[rs2] if 
i — 0, or as the least significant 7 or 8 bits of R[rs1] + imm trap£ ifi = 1. Whether 
7 or 8 bits are used depends on the privilege level — 7 bits are used in 
nonprivileged mode and 8 bits are used in privileged mode. The trap level, TL, is 
incremented. The hardware trap type is computed as 256 + the software trap 
number and stored in TT[TL]. The effective address is generated by combining the 
contents of the TBA register with the trap type and other data; see Trap Processing 
on page 443 for details. 


m Trap state effective address — A trap state effective address is not computed but 
is taken directly from either TPC[TL] or TNPC[TL]. 


SPARC V8 | The SPARC V8 architecture specified that the delay instruction 
Compatibility | was always fetched, even if annulled, and that an annulled 
Note | instruction could not cause any traps. The SPARC V9 
architecture does not require the delay instruction to be fetched 
if it is annulled. 


6.3.4.1 Conditional Branches 





A conditional branch transfers control if the specified condition is TRUE. If the annul 
bit is 0, the instruction in the delay slot is always executed. If the annul bit is 1, the 
instruction in the delay slot is executed only when the conditional branch is taken. 


Note | The annuling behavior of a taken conditional branch is different 
from that of an unconditional branch. 


6.3.4.2 Unconditional Branches 


An unconditional branch transfers control unconditionally if its specified condition 
is "always"; it never transfers control if its specified condition is “never.” If the 
annul bit is 0, then the instruction in the delay slot is always executed. If the annul 
bit is 1, then the instruction in the delay slot is never executed. 


Note | The annul behavior of an unconditional branch is different from 
that of a taken conditional branch. 
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6.3.4.3 CALL and JMPL Instructions 


The CALL instruction writes the contents of the PC, which points to the CALL 
instruction itself, into R[15] (out register 7) and then causes a delayed transfer of 
control to a PC-relative effective address. The value written into R[15] is visible to 
the instruction in the delay slot. 


The JMPL instruction writes the contents of the PC, which points to the JMPL 
instruction itself, into R[rd] and then causes a register-indirect delayed transfer of 
control to the address given by "R[rs1] + R[rs2]" or “Rfrsi]+ a signed immediate 
value." The value written into R[rd] is visible to the instruction in the delay slot. 


When PSTATE.am = 1, the value of the high-order 32 bits transmitted to R[15] by the 
CALL instruction or to R[rd] by the JMPL instruction is zero. 


6.3.4.4 RETURN Instruction 


The RETURN instruction is used to return from a trap handler executing in 
nonprivileged mode. RETURN combines the control-transfer characteristics of a 
JMPL instruction with R[0] specified as the destination register and the register- 
window semantics of a RESTORE instruction. 


6.3.4.5 DONE and RETRY Instructions 


The DONE and RETRY instructions are used by privileged software to return from a 
trap. These instructions restore the machine state to values saved in the TSTATE 
register stack. 


RETRY returns to the instruction that caused the trap in order to reexecute it. DONE 
returns to the instruction pointed to by the value of NPC associated with the 
instruction that caused the trap, that is, the next logical instruction in the program. 
DONE presumes that the trap handler did whatever was requested by the program 
and that execution should continue. 


6.3.4.6 Trap Instruction (Tecc) 


The Tce instruction initiates a trap if the condition specified by its cond field matches 
the current state of the condition code register specified in its cc field; otherwise, it 
executes as a NOP. If the trap is taken, it increments the TL register, computes a trap 
type that is stored in TT[TL], and transfers to a computed address in a trap table 
pointed to by a trap base address register. 


A Tce instruction can specify one of 256 software trap types (128 when in 
nonprivileged mode). When a Tcc is taken, 256 plus the 7 (in nonprivileged mode) or 
8 (in privileged mode) least significant bits of the Tcc's second source operand are 
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written to TT[TL]. The only visible difference between a software trap generated by 
a Tcc instruction and a hardware trap is the trap number in the TT register. See 
Chapter 12, Traps, for more information. 


Programming | Tcc can be used to implement breakpointing, tracing, and calls 
Note | to privileged or hyperprivileged software. Tec can also be used 
for runtime checks, such as out-of-range array index checks or 

integer overflow checks. 


6.3.4.7 DCTI Couples 2) 


A delayed control transfer instruction (DCTI) in the delay slot of another DCTI is 
referred to as a “DCTI couple”. The use of DCTI couples is deprecated in the 
UltraSPARC Architecture; no new software should place a DCTI in the delay slot of 
another DCTI, because on future UltraSPARC Architecture implementations DCTI 
couples may execute either slowly or differently than the programmer assumes it 
will. 


SPARC V8 and | The SPARC V8 architecture left behavior undefined for a DCTI 
SPARC V9 | couple. The SPARC V9 architecture defined behavior in that 
Compatibility | case, but as of UltraSPARC Architecture 2005, use of DCTI couples 
Note | is deprecated. 


Conditional Move Instructions 


This subsection describes two groups of instructions that copy or move the contents 
of any integer or floating-point register. 


MOVcc and FMOVcc Instructions. The MOVcc and FMOVcc instructions copy 
the contents of any integer or floating-point register to a destination integer or 
floating-point register if a condition is satisfied. The condition to test is specified in 
the instruction and can be any of the conditions allowed in conditional delayed 
control-transfer instructions. This condition is tested against one of the six sets of 
condition codes (icc, xcc, fec0, fcc1, fcc2, and fcc3), as specified by the instruction. 
For example: 


fmovdg Sfcc2, $f20, $f22 


moves the contents of the double-precision floating-point register $£20 to register 
$ £22 if floating-point condition code number 2 (fcc2) indicates a greater-than 
relation (FSR.fcc2 = 2). If fcc2 does not indicate a greater-than relation 

(FSR.fcc2 # 2), then the move is not performed. 


The MOVcc and FMOVcc instructions can be used to eliminate some branches in 
programs. In most implementations, branches will be more expensive than the 
MOVcc or FMOVcc instructions. For example, the C statement: 


if (A > B) X = 1; lse X = 0; 
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can be coded as 


cmp $i0, $i2 ! (A > B) 
or $g0, 0, %13 ! set X = 0 
movg $xcc, 1, $i3 ! overwrite X with 1 if À > B 


to eliminate the need for a branch. 


MOVr and FMOVr Instructions. The MOVr and FMOVr instructions allow the 
contents of any integer or floating-point register to be moved to a destination integer 
or floating-point register if the contents of a register satisfy a specified condition. 
The conditions to test are enumerated in TABLE 6-6. 


TABLE 6-6 MOVr and FMOVr Test Conditions 





Condition Description 

NZ Nonzero 

Z Zero 

GEZ Greater than or equal to zero 
LZ Less than zero 

LEZ Less than or equal to zero 
GZ Greater than zero 





Any of the integer registers (treated as a signed value) may be tested for one of the 
conditions, and the result used to control the move. For example, 


movrnz $12, $14, $16 


moves integer register $14 to integer register $16 if integer register $i2 contains a 
nonzero value. 


MOVr and FMOVr can be used to eliminate some branches in programs or can 
emulate multiple unsigned condition codes by using an integer register to hold the 
result of a comparison. 


Register Window Management Instructions 


This subsection describes the instructions that manage register windows in the 
UltraSPARC Architecture. The privileged registers affected by these instructions are 
described in Register-Window PR State Registers on page 81. 


6.3.6.1 SAVE Instruction 


The SAVE instruction allocates a new register window and saves the caller's register 
window by incrementing the CWP register. 


116 UltraSPARC Architecture 2005 + Draft DO.9.2, 19 Jun 2008 


If CANSAVE = 0, then execution of a SAVE instruction causes a window spill 
exception, that is, one of the spill_n_<normal| other» exceptions. 


If CANSAVE z 0 but the number of clean windows is zero, that is, 
(CLEANWIN — CANRESTORE) = 0, then SAVE causes a clean window exception. 


If SAVE does not cause an exception, it performs an ADD operation, decrements 
CANSAVE, and increments CANRESTORE. The source registers for the ADD 
operation are from the old window (the one to which CWP pointed before the 
SAVE), while the result is written into a register in the new window (the one to 
which the incremented CWP points). 


6.3.6.2 RESTORE Instruction 


The RESTORE instruction restores the previous register window by decrementing 
the CWP register. 


If CANRESTORE = 0, execution of a RESTORE instruction causes a window fill 
exception, that is, one of the fill_n_<normal| other» exceptions. 


If RESTORE does not cause an exception, it performs an ADD operation, decrements 
CANRESTORE, and increments CANSAVE. The source registers for the ADD are 
from the old window (the one to which CWP pointed before the RESTORE), and the 
result is written into a register in the new window (the one to which the 
decremented CWP points). 


Programming | This note describes a common convention for use of register 
Note | windows, SAVE, RESTORE, CALL, and JMPL instructions. 


A procedure is invoked by execution of a CALL (or a JMPL) 
instruction. If the procedure requires a register window, it 
executes a SAVE instruction in its prologue code. A routine that 
does not allocate a register window of its own (possibly a leaf 
procedure) should not modify any windowed registers except 
out registers 0 through 6. This optimization, called “Leaf- 
Procedure Optimization”, is routinely performed by SPARC 
compilers. 


A procedure that uses a register window returns by executing 
both a RESTORE and a JMPL instruction. A procedure that has 
not allocated a register window returns by executing a JMPL 
only. The target address for the JMPL instruction is normally 8 
plus the address saved by the calling instruction, that is, the 
instruction after the instruction in the delay slot of the calling 
instruction. 


The SAVE and RESTORE instructions can be used to atomically 
establish a new memory stack pointer in an R register and 
switch to a new or previous register window. 
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6.3.6.3 SAVED Instruction 


SAVED is a privileged instruction used by a spill trap handler to indicate that a 
window spill has completed successfully. It increments CANSAVE and decrements 
either OTHERWIN or CANRESTORE, depending on the conditions at the time 
SAVED is executed. 


See SAVED on page 302 for details. 


6.3.6.4 RESTORED Instruction 


RESTORED is a privileged instruction, used by a fill trap handler to indicate that a 
window has been filled successfully. It increments CANRESTORE and decrements 
either OTHERWIN or CANSAVE, depending on the conditions at the time 
RESTORED is executed. RESTORED also manipulates CLEANWIN, which is used to 
ensure that no address space’s data become visible to another address space through 
windowed registers. 


See RESTORED on page 294 for details. 


6.3.6.5 Flush Windows Instruction 


The FLUSHW instruction flushes all of the register windows, except the current 
window, by performing repetitive spill traps. The FLUSHW instruction causes a spill 
trap if any register window (other than the current window) has valid contents. The 
number of windows with valid contents is computed as: 


N REG WINDOWS — 2 — CANSAVE 


If this number is nonzero, the FLUSHW instruction causes a spill trap. Otherwise, 
FLUSHW has no effect. If the spill trap handler exits with a RETRY instruction, the 
FLUSHW instruction continues causing spill traps until all the register windows 
except the current window have been flushed. 


Ancillary State Register (ASR) Access 


The read/write state register instructions access program-visible state and status 
registers. These instructions read/write the state registers into/from R registers. A 
read/write Ancillary State register instruction is privileged only if the accessed 
register is privileged. 


The supported RDasr and WRasr instructions are described in Ancillary State 
Registers on page 67. 
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Privileged Register Access 


The read/write privileged register instructions access state and status registers that 
are visible only to privileged software. These instructions read/write privileged 
registers into/from R registers. The read/write privileged register instructions are 
privileged. 


Floating-Point Operate (FPop) Instructions 


Floating-point operate instructions (FPops) compute a result that is a function of one 
or two source operands and place the result in one or more destination F registers, 

with one exception: floating-point compare operations do not write to an F register 
but instead update one of the fcon fields of the FSR. 


The term “FPop” refers to instructions in the FPop1, and FPop2 opcode spaces. FPop 
instructions do not include FBfcc instructions, loads and stores between memory 
and the F registers, or non-floating-point operations that read or write F registers. 


The FMOVcc instructions function for the floating-point registers as the MOVcc 
instructions do for the integer registers. See MOVcc and FMOVcc Instructions on page 
115. 


The FMOVr instructions function for the floating-point registers as the MOVr 
instructions do for the integer registers. See MOVr and FMOVr Instructions on page 
116. 


If no floating-point unit is present or if PSTATE.pef = 0 or FPRS.fef = 0, then any 
instruction, including an FPop instruction, that attempts to access an FPU register 
generates an fp disabled exception. 


All FPop instructions clear the ftt field and set the cexc field unless they generate an 
exception. Floating-point compare instructions also write one of the fccn fields. All 
FPop instructions that can generate IEEE exceptions set the cexc and aexc fields 
unless they generate an exception. FABS«sldlq», FMOV«sldlq», 

FMOVce<s |d| q», FMOVr<s |d | q>, and FNEG<s |d | q> cannot generate IEEE 
exceptions, so they clear cexc and leave aexc unchanged. 


IMPL. DEP. #3-V8: An implementation may indicate that a floating-point instruction 
did not produce a correct IEEE Std 754-1985 result by generating an 

fp exception other exception with FSR ftt = unfinished FPop or 

FSR.ftt = unimplemented FPop. In this case, software running in a mode with 
greater privileges must emulate any functionality not present in the hardware. 


See ftt = 2 (unfinished FPop) on page 62 to see which instructions can produce an 
fp exception other exception (with FSR.ftt = unfinished FPop). See ftt = 3 
(unimplemented_FPop) on page 62 to see which instructions can produce an 

fp exception other exception (with FSR.ftt = unimplemented FPop). 
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Implementation-Dependent Instructions 


The SPARC V9 architecture provided two instruction spaces that are entirely 
implementation dependent: IMPDEP1 and IMPDEP2. 


In the UltraSPARC Architecture, the IMPDEP1 opcode space is used by VIS 
instructions. 


In the UltraSPARC Architecture, IMPDET2 is subdivided into IMPDEP2A and 
IMPDEP2B. IMPDEP2A remains implementation dependent. The IMPDEP2B opcode 
space is reserved for implementation of floating-point multiply-add/multiply- 
subtract instructions. 


Reserved Opcodes and Instruction Fields 


If a conforming UltraSPARC Architecture 2005 implementation attempts to execute 
an instruction bit pattern that is not specifically defined in this specification, it 
behaves as follows: 


m If the instruction bit pattern encodes an implementation-specific extension to the 
instruction set, that extension is executed. 


m If the instruction bit pattern does not encode an extension to the instruction set, 
but would decode as a valid instruction if nonzero bits in reserved instruction 
field(s) were ignored (read as 0): 


a The recommended behavior is to generate an illegal instruction exception (or, 
for FPop, an fp exception other exception with FSR.ftt = 3 
(unimplemented_FPop)). 


» Alternatively, the implementation can ignore the nonzero reserved field bits 
and execute the instruction as if those bits had been zero. 


m If the instruction bit pattern does not encode an extension to the instruction set 
and would still not decode as a valid instruction if nonzero bits in reserved 
instruction field(s) were ignored, then the instruction bit pattern is invalid and 
causes an exception. Specifically, attempting to execute an FPop instruction (see 
Floating-Point Operate on page 29) causes an fp. exception other exception (with 
FSR.ftt = unimplemented_FPop); attempting to execute any other invalid 
instruction bit pattern causes an illegal_ instruction exception. 


Forward | To further enhance backward (and forward) binary 
Compatibility | compatibility, the next revision of the UltraSPARC Architecture 
Note | is expected to require an illegal instruction exception to be 
generated by any instruction bit pattern that encodes neither a 
known UItraSPARC Architecture instruction nor an 
implementation-specific extension instruction (including those 
with nonzero bits in reserved instruction fields). 
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See Appendix A, Opcode Maps, for an enumeration of the reserved instruction bit 
patterns (opcodes). 


Implementation | As described above, implementations are strongly encouraged, 
Note | but not strictly required, to trap on nonzero values in reserved 
instruction fields. 


Programming | For software portability, software (such as assemblers, static 

Note | compilers, and dynamic compilers) that generates SPARC 
instructions must always generate zeroes in instruction fields 
marked “reserved” ("—"). 
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CHAPTER T 


Instructions 





UltraSPARC Architecture 2005 extends the standard SPARC V9 instruction set with 
additional classes of instructions: 


m Enhanced functionality: 

Instructions for alignment (Align Address on page 135) 

Array handling (Three-Dimensional Array Addressing on page 138) 

Byte-permutation instructions (Byte Mask and Shuffle on page 144) 

Edge handling (Edge Handling Instructions on pages 156 and 158) 

Logical operations on floating-point registers (F Register Logical Operate (1 

operand) on page 211) 

a Partitioned arithmetic (Fixed-point Partitioned Add on page 203 andFixed-point 
Partitioned Subtract (64-bit) on page 208) 

» Pixel manipulation (FEXPAND on page 172, FPACK on page 197, and 
FPMERGE on page 206) 


= 
m Efficient memory access 


a Partial store (Store Partial Floating-Point on page 329) 
a Short floating-point loads and stores (Store Short Floating-Point on page 332) 
» Block load and store (Block Load on page 232 and Block Store on page 317) 


m Efficient interval arithmetic: SIAM (Set Interval Arithmetic Mode on page 308) and 
all instructions that reference GSR.im 


TABLE 7-2 provides a quick index of instructions, alphabetically by architectural 
instruction name. 


TABLE 7-3 summarizes the instruction set, listed within functional categories. 


123 


Within these tables and throughout the rest of this chapter, and in Appendix A, 
Opcode Maps, certain opcodes are marked with mnemonic superscripts. The 
superscripts and their meanings are defined in TABLE 7-1. 


TABLE 7-1 Instruction Superscripts 





Superscript Meaning 

D Deprecated instruction (do not use in new software) 

N Nonportable instruction 

P Privileged instruction 

Past Privileged action if bit 7 of the referenced ASI is 0 

Pasr Privileged instruction if the referenced ASR register is privileged 
Papt Privileged action if in nonprivileged mode (PSTATE.priv = 0) and 


nonprivileged access is disabled 


Ppic Privileged action if PCR.priv = 1 
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Page Instruction 

134 ADD (ADDcc) 

134 ADDC (ADDCcc) 

135  ALIGNADDRESS[ LITTLE] 
136 ALLCLEAN 

137 AND (ANDcc) 

138 ARRAY<8116132> 
142  Bicc 

144 BMASK 

145  BPcc 

148  BPr 

144  BSHUFFLE 

150 CALL 

151 CASAPas! 

151 CASxAbw 

154 DONE? 

156 EDGE<8116132>[L]cc 
158 EDGE<8116132>[LIN 
218 F«sldlq»TO«sldlq» 
216 F«sldlq»TOi 

216 F«sldlq»TOx 

159  FABSssldlq» 

160 FADD<sidiq> 

161 FALIGNDATA 

214 FANDNOT<112>[s] 
214  FAND[s] 

162 FBfcc? 

164  FBPfcc 

169 FCMPe<sldiq> 

166 FCMP*<16,32> 

169  FCMPE«sslIdlq» 

171 FDIV«sldlq» 

194 FdMULq 

172 FEXPAND 

173  FiTO<sidlq> 

174 FLUSH 

177 FLUSHW 

178 FMOV<sldiq> 


180 
185 
194 
188 
188 
188 
188 
214 
196 
214 
212 
211 
214 
214 
197 
203 
206 
208 
194 
215 
212 
220 
214 
214 
221 
211 
222 
223 
223 
225 
226 
232 
236 
239 
236 
239 
243 


MOV<s!d1q>cc 
MOVz«slIdlq»R 
MUL«sIdlq» 
MULS[SU | UL]x16 





NOT«1 | 2>[s] 
FONE[s] 
FORNOT<1 | 2>[s] 
FOR[s] 
FPACK<16 |32| FIX> 
FPADD<16,32>[S] 
FPMERGE 
FPSUB<16,32>[S] 
FsMULd 
FSQRT<s | d1q> 
FSRC<1 | 2>[s] 
FSUB<s|d1q> 
FXNOR Is] 
FXOR[s] 
FxTO«sldlq» 
FZERO[s] 
ILLTRAP 
IMPDEP2A 
IMPDEP2B 
INVALW 

JMPL 
LDBLOCKF 
LDDF 

LDDFAP^s 

LDF 

LDFAP^s 
LDFSRP 


236 
239 
227 
229 
227 
229 
245 
247 
248 
227 
229 
255 
250 
252 
247 
229 
227 
229 
227 
229 
227 
229 
258 
260 
264 
268 
270 
272 
273 
274 
275 
275 
276 
277 
278 
280 
280 
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LDQF 
LDQFAPAs 
LDSB 
LDSBAP^s 
LDSH 
LDSHAP^s 
LDSHORTF 
LDSTUB 
LDSTUBAP^s 
LDSW 
LDSWAPasi 
LDTXAN 
LDTWP 
LDTWAP: Past 
LDUB 
LDUBAP^s 
LDUH 
LDUHAP^s 
LDUW 
LDUWAP^s 
LDX 

LDXAP^s 
LDXFSR 
MEMBAR 
MOVcc 

MOVr 
MULScc? 
MULX 

NOP 
NORMALW 

OR (ORcc) 

ORN (ORNcc) 
OTHERW 
PDIST 

POPC 
PREFETCH 
PREFETCHAP^s 
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Page Instruction 





287 RDAS 2 SIDF 60 WRPR  . 
287  RDasrP^sk 323 STDFAP^s 358  WRSOFTINT_ CLR? 
287 RDCCR 321 STF 358  WRSOFTINT_SET? 
287 RDFPRS 323 STFAP^« 358 WRSOFTINT? 
287 RDGSR 327 STFSRP 358 WRSTICK_CMPR? 
313 STH 358 WRSTICK? 
287 RDPC 314 STHAPas! 358 WRTICK_CMPR? 
287 RDPCR? 329 STPARTIALF 358 WRYP 
287 RDPICPric 321 STOF 363 XNOR (XNORcc) 
290 RDPRP 323 STQFAPast 363 XOR (XORcc) 
287 RDSOFTINT? 332 STSHORTF 
287 RDSTICK_CMPR? 334 STTWP 
287  RDSTICKP?»e 336 STTWADP Pasi 
287 RDTICK_CMPR? 313 STW 
287  RDTICKPr! 314 STWAPasi 
294 RESTORED? 313 STX 
292 RESTORE? 314 STXAPast 
296 RETRY? 339  STXFSR 
298 RETURN 341 SUB (SUBcc) 
302 SAVED? 341  SUBC (SUBCcc) 
300 SAVEP 343  SWAPAL: Pas 
304  SDIVP (SDIVccP) 342 SWAPP 
272 SDIVX 845 TADDcc 
306 SETHI 846 TADDccTVP 
307 SHUTDOWNP? 348  Tcc 
308 SIAM 351  TSUBcc 
352 TSUBccTVP 
309 SLL 354  UDIVP (UDIVcc?) 
309 SLLX 272 UDIVX 
311 SMULP (SMULcc?) 356 UMULP (UMULcc?) 
309 SRA 358 WRASI 
309 SRAX 358 WRasrPasr 
309 SRL 358 WRCCR 
309 SRLX 358  WRFPRS 
313 STB 358 WRGSR 
314 STBAP Asi 
358  WRPCR? 
317 STBLOCKF 358 WRPICPric 
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TABLE 7-3 Instruction Set - by Functional Category (1 of 6) 




















Ext. to 

Instruction Category and Function Page v9? 
Data Movement Operations, Between R Registers 
MOVcc Move integer register if condition is satisfied 264 
MOVr Move integer register on contents of integer register 268 
Data Movement Operations, Between F Registers 
FMOV«sldlq» Floating-point move 178 
FMOV<s |d | q>cc Move floating-point register if condition is satisfied 180 
FMOV«slIdlq»R Move f-p reg. if integer reg. contents satisfy condition 185 
FSRC«112»[s] Copy source 212 VIS 1 
Data Conversion Instructions 
FiTO«sldlq» Convert 32-bit integer to floating-point 173 
F«sldlq»TOi Convert floating point to integer 216 
F«sldlq»TOx Convert floating point to 64-bit integer 216 
F«sldlgq»TO«sldlq» Convert between floating-point formats 218 
FxTO«sldlq» Convert 64-bit integer to floating-point 221 
Logical Operations on R Registers 
AND (ANDcc) Logical and (and modify condition codes) 137 
OR (ORcc) Inclusive-or (and modify condition codes) 275 
ORN (ORNcc) Inclusive-or not (and modify condition codes) 275 
XNOR (XNORcc) Exclusive-nor (and modify condition codes) 363 
XOR (XORcc) Exclusive-or (and modify condition codes) 363 
Logical Operations on F Registers 
FAND{[s] Logical and operation 214 VIS 1 
FANDNOT«1 !2»[s] Logical and operation with one inverted source 214 VIS 1 
FNANDJs] Logical nand operation 214 VIS 1 
FNOR[s] Logical nor operation 214 VIS 1 
FNOT«112»[s] Copy negated source 212 VIS 1 
FONE[s] One fill 211 VIS 1 
FOR[s] Logical or operation 214 VIS 1 
FORNOT«112»[s] Logical or operation with one inverted source 214 VIS 1 
FXNOR[s] Logical xnor operation 214 VIS 1 
FXOR[s] Logical xor operation 214 VIS 1 
FZERO[s] Zero fill 211 VIS 1 
Shift Operations on R Registers 

SLL Shift left logical 309 
SLLX Shift left logical, extended 309 
SRA Shift right arithmetic 309 
SRAX Shift right arithmetic, extended 309 
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TABLE 7-3 Instruction Set - by Functional Category (2 of 6) 

















Ext. to 
Instruction Category and Function Page v9? 
SRL Shift right logical 309 
SRLX Shift right logical, extended 309 

Special Addressing Operations 
ALIGNADDRESS[ LITTLE] Calculate address for misaligned data 135 VIS 1 
ARRAY<8 | 16 | 32> 3-D array addressing instructions 138 VIS 1 
FALIGNDATA Perform data alignment for misaligned data 161 VIS 1 
Control Transfers 
Bicc Branch on integer condition codes 142 
BPcc Branch on integer condition codes with prediction 145 
BPr Branch on contents of integer register with prediction 148 
CALL Call and link 150 
DONE? Return from trap 154 
FBfccP Branch on floating-point condition codes 162 
FBPfcc Branch on floating-point condition codes with prediction 164 
ILLTRAP Illegal instruction 222 
JMPL Jump and link 226 
RETRYP Return from trap and retry 296 
RETURN Return 298 
Tec Trap on integer condition codes 348 
Byte Permutation 
BMASK Set the GSR.mask field 144 VIS 2 
BSHUFFLE Permute bytes as specified by GSR.mask 144 VIS 2 
Data Formatting Operations on F Registers 
FEXPAND Pixel expansion 172 VIS 1 
FPACK<16 |32 | FIX> Pixel packing 197 VIS 1 
FPMERGE Pixel merge 206 VIS 1 
Memory Operations to/from F Registers 

LDBLOCKF Block loads 232 VIS 1 
STBLOCKF Block stores 317 VIS 1 
LDDF Load double floating-point 236 
LDDEAPast Load double floating-point from alternate space 239 
LDF Load floating-point 236 
LDFAPASI Load floating-point from alternate space 239 
LDQF Load quad floating-point 236 
LDQFAPss Load quad floating-point from alternate space 239 
LDSHORTF Short floating-point loads 245 VIS 1 
STDF Store double floating-point 321 
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TABLE 7-3 Instruction Set - by Functional Category (3 of 6) 











Ext. to 
Instruction Category and Function Page v9? 
STDFAPW Store double floating-point into alternate space 323 o 
STF Store floating-point 321 
STFAPASI Store floating-point into alternate space 323 
STPARTIALF Partial Store instructions 329 VIS 1 
STQF Store quad floating point 321 
STQFAP^s Store quad floating-point into alternate space 323 
STSHORTF Short floating-point stores 332 VIS 1 
"Memory Operations — Miscellaneous sssss—s—SsS 
LDFSRP Load floating-point state register (lower) 243 
LDXFSR Load floating-point state register 258 
MEMBAR Memory barrier 260 
PREFETCH Prefetch data 280 
PREFETCHAPASI Prefetch data from alternate space 280 
STFSR? Store floating-point state register (lower) 327 
STXFSR Store floating-point state register 339 

Atomic (Load-Store) Memory Operations to/from R Registers 
CASAPast Compare and swap word in alternate space 151 
CASXAPASI Compare and swap doubleword in alternate space 151 
LDSTUB Load-store unsigned byte 247 
LDSTUBAPASI Load-store unsigned byte in alternate space 248 
SWAPP Swap integer register with memory 342 
SWAPAP Pasi Swap integer register with memory in alternate space 343 
Memory Operations to/from R Registers 

LDSB Load signed byte 227 
LDSBAPast Load signed byte from alternate space 229 
LDSH Load signed halfword 227 
LDSHA!ast Load signed halfword from alternate space 229 
LDSW Load signed word 227 
LDSWAP^s Load signed word from alternate space 229 
LDTXAN Load integer twin extended word from alternate space 255 VIS 2+ 
LDTWD Pas Load integer twin word 250 
LDTWAP Past Load integer twin word from alternate space 252 
LDUB Load unsigned byte 247 
LDUBAPASI Load unsigned byte from alternate space 229 
LDUH Load unsigned halfword 227 
LDUHAPAS Load unsigned halfword from alternate space 229 
LDUW Load unsigned word 227 
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TABLE 7-3 Instruction Set - by Functional Category (4 of 6) 
Ext. to 
Instruction Category and Function Page v9? 
LDUWAPS ^ Load unsigned word from alternate space 22 
LDX Load extended 227 
LDXAPas! Load extended from alternate space 229 
STB Store byte 313 
STBAP»s Store byte into alternate space 314 
STTWP Store twin word 334 
STTWAP Ps Store twin word into alternate space 336 
STH Store halfword 313 
STHAP»s Store halfword into alternate space 314 
STW Store word 313 
STWAP^s Store word into alternate space 314 
STX Store extended 313 
STXAPas! Store extended into alternate space 314 
Floating-Point Arithmetic Operations 
FABS«sldlq» Floating-point absolute value 159 
FADD«sldlq» Floating-point add 160 
FDIV«sldlq» Floating-point divide 171 
FdMULq Floating-point multiply double to quad 194 
FMUL«sIdlq» Floating-point multiply 194 
FNEG«sldlq» Floating-point negate 196 
FsMULd Floating-point multiply single to double 194 
FSORT«sIdlq» Floating-point square root 215 
FSUB«slIdlq» Floating-point subtract 220 
Floating-Point Comparison Operations 
FCMP*<16,32> Compare four 16-bit signed values or two 32-bit signed values 166 VIS 1 
FCMP«slIdlq» Floating-point compare 169 
FCMPE<s|d1 q> Floating-point compare (exception if unordered) 169 
Register-Window Control Operations 
ALLCLEAN Mark all register window sets as “clean” 136 
INVALW Mark all register window sets as “invalid” 225 
FLUSHW Flush register windows 177 
NORMALW "Other" register windows become "normal" register windows 274 
OTHERW “Normal” register windows become "other" register windows 276 
RESTORE" Restore caller's window 292 
RESTORED? Window has been restored 294 
SAVE? Save caller’s window 300 
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TABLE 7-3 Instruction Set - by Functional Category (5 of 6) 
Ext. to 
Instruction Category and Function Page v9? 
SAVED” Window has been saved | 32 
Miscellaneous Operations 
FLUSH Flush instruction memory 174 
IMPDEP2A Implementation-dependent instructions 223 
IMPDEP2B Implementation-dependent instructions (reserved) 223 
NOP No operation 273 
SHUTDOWNP? Shut down the virtual processor 307 VIS 1 
Integer SIMD Operations on F Registers 
FPADD<16,32>[S] Fixed-point partitioned add 203 VIS 1 
FPSUB<16,32>[S] Fixed-point partitioned subtract 208 VIS 1 
Integer Arithmetic Operations on R Registers 
ADD (ADDcc) Add (and modify condition codes) 134 
ADDC (ADDCcc) Add with carry (and modify condition codes) 134 
MULScc? Multiply step (and modify condition codes) 270 
MULX Multiply 64-bit integers 272 
SDIVP (SDIVccP) 32-bit signed integer divide (and modify condition codes) 304 
SDIVX 64-bit signed integer divide 272 
SMULP (SMULcc?) Signed integer multiply (and modify condition codes) 311 
SUB (SUBcc) Subtract (and modify condition codes) 341 
SUBC (SUBCcc) Subtract with carry (and modify condition codes) 341 
TADDcc Tagged add and modify condition codes (trap on overflow) 345 
TADDccTVP Tagged add and modify condition codes (trap on overflow) 346 
TSUBcc Tagged subtract and modify condition codes (trap on overflow) 351 
TSUBccTVP Tagged subtract and modify condition codes (trap on overflow) 352 
UDIVP (UDIVcc?) Unsigned integer divide (and modify condition codes) 354 
UDIVX 64-bit unsigned integer divide 272 
UMULP (UMULcc?) Unsigned integer multiply (and modify condition codes) 356 
Integer Arithmetic Operations on F Registers 
FMUL8x16 8x16 partitioned product 188 VIS 1 
FMULS8x16[AU | AL] 8x16 upper/lower a partitioned product 188 VIS 1 
FMULS[SU | UL]x16 8x16 upper/lower partitioned product 188 VIS 1 
FMULDS[SU | UL]x16 8x16 upper/lower partitioned product 188 VIS 1 
Miscellaneous Operations on R Registers 
POPC Population count 278 
SETHI Set high 22 bits of low word of integer register 306 
Miscellaneous Operations on F Registers 
EDGE«8116132»[L]cc Edge handling instructions (and modify condition codes) 156 VIS 1 
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TABLE 7-3 Instruction Set - by Functional Category (6 of 6) 

Ext. to 
Instruction Category and Function Page v9? 
EDGE<8 | 16132>[L]N Edge handling instructions 158 VIS 2 
PDIST Pixel component distance 277 VIS 1 

Control and Status Register Access 
RDASI Read ASI register 287 
RDasrPAsR Read ancillary state register 287 
RDCCR Read Condition Codes register (CCR) 287 
RDFPRS Read Floating-Point Registers State register (FPRS) 287 
RDGSR Read General Status register (GSR) 287 
RDPC Read Program Counter register (PC) 287 
RDPCR? Read Performance Control register (PCR) 287 
RDPICP?c Read Performance Instrumentation Counters register (PIC) 287 
RDPR? Read privileged register 290 
RDSOFTINT? Read per-virtual processor Soft Interrupt register (SOFTINT) 287 
RDSTICK? Pt Read System Tick register (STICK) 287 
RDSTICK_CMPR? Read System Tick Compare register (STICK CMPR) 287 
RDTICKP»e Read Tick register (TICK) 287 
RDTICK CMPRP Read Tick Compare register (TICK CMPR) 287 
SIAM Set interval arithmetic mode 308 VIS 2 
WRASI Write ASI register 358 
WRasrPASR Write ancillary state register 358 
WRCCR Write Condition Codes register (CCR) 358 
WRFPRS Write Floating-Point Registers State register (FPRS) 358 
WRGSR Write General Status register (GSR) 358 
WRPCRP Write Performance Control register (PCR) 358 
WRPICPnE Write Performance Instrumentation Counters register (PIC) 358 
WRPR? Write privileged register 360 
WRSOFTINT? Write per-virtual processor Soft Interrupt register (SOFTINT) 358 
WRSOFTINT_CLR? Clear bits of per-virtual processor Soft Interrupt register 358 
(SOFTINT) 

WRSOFTINT_SET? Set bits of per-virtual processor Soft Interrupt register (SOFTINT) 358 
WRTICK_CMPR? Write Tick Compare register (TICK_CMPR) 358 
WRSTICK? Write System Tick register (STICK) 358 
WRSTICK_CMPR? Write System Tick Compare register (STICK_CMPR) 358 
WRYP Write Y register 358 
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In the remainder of this chapter, related instructions are grouped into subsections. 
Each subsection consists of the following sets of information: 


(1) Instruction Table. This lists the instructions that are defined in the subsection, 
including the values of the field(s) that uniquely identify the instruction(s), assembly 
language syntax, and software and implementation classifications for the 
instructions. (description of the Software Classes [letters] and Implementation Classes 
[digits] will be provided in a later update to this specification) 


Note | Instruction classes will be defined in a later draft of this document 
and in the meantime are subject to change. 


(2) Illustration of Instruction Format(s). These illustrations show how the 
instruction is encoded in a 32-bit word in memory. In them, a dash (—) indicates 
that the field is reserved for future versions of the architecture and must be 0 in any 
instance of the instruction. If a conforming UltraSPARC Architecture 
implementation encounters nonzero values in these fields, its behavior is as defined 
in Reserved Opcodes and Instruction Fields on page 120. 


(3) Description. This subsection describes the operation of the instruction, its 
features, restrictions, and exception-causing conditions. 


(4) Exceptions. The exceptions that can occur as a consequence of attempting to 
execute the instruction(s). Exceptions due to an instruction_access_exception, and 
interrupts are not listed because they can occur on any instruction. An FPop that is 
not implemented in hardware generates an fp_exception_other exception with 
FSR.ftt = unimplemented FPop when executed. A non-FPop instruction not 
implemented in hardware generates an illegal instruction exception and therefore 
will not generate any of the other exceptions listed. Exceptions are listed in order of 
trap priority (see Trap Priorities on page 442), from highest to lowest priority. 


(5) See Also. A list of related instructions (on selected pages). 


Note | This specification does not contain any timing information (in 
either cycles or elapsed time), since timing is always 
implementation dependent. 
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ADD 





7.1 Add 


Instruction  op3 Operation Assembly Language Syntax Class 
ADD 00 0000 Add add VESrstr leg or imm, Tegra A 
ADDcc 01 0000 Add and modify cc’s addcc  regyg, reg or imm, regag A1 
ADDC 00 1000 Add with 32-bit Carry addc reg,s,, leg or imm, regag A1 
ADDCcc 01 1000 Add with 32-bit Carry and modify cc's addccc reg;g;, reg or imm, reg; A1 


Dx NN mmis 


31 30 29 25 24 19 18 14 13 12 5 4 0 


Description If i=0, ADD and ADDcc compute “R[rs1] + R[rs2]". If i= 1, they compute 
^R[rs1] + sign ext (simm13)". In either case, the sum is written to R[rd]. 


ADDC and ADDCcc ("ADD with carry") also add the CCR register's 32-bit carry 
(icc.c) bit. That is, if i= 0, they compute "R[rs1] + R[rs2] + icc.c" and if i = 1, they 
compute "R[rs1] + sign ext (simm13) + icc.c". In either case, the sum is written to 
R[rd]. 


ADDcc and ADDCcc modify the integer condition codes (CCR.icc and CCR.xcc). 
Overflow occurs on addition if both operands have the same sign and the sign of the 
sum is different from that of the operands. 


Programming | ADDC and ADDCcc read the 32-bit condition codes’ carry bit 
Note | (CCR.icc.c), not the 64-bit condition codes’ carry bit (CCR.xcc.c). 


SPARC V8 
Compatibility 
Note 


ADDC and ADDCcc were previously named ADDX and 
ADDxXcc, respectively, in SPARC V8. 





An attempt to execute an ADD, ADDcc, ADDC or ADDCcc instruction when i = 0 
and reserved instruction bits 12:5 are nonzero causes an illegal instruction exception. 


Exceptions illegal instruction 
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ALIGNADDRESS 





7.2 Align Address 


Instruction opf Operation Assembly Language Syntax Class 
ALIGNADDRESS 000011000 Calculate address for misaligned alignaddr regrsir Tegrs2r Te8rd A1 
data access 


ALIGNADDRESS_ 0 00011010 Calculate address for misaligned alignaddrl regrsır regrs2, Tera A1 
LITTLE data access little-endian 





3 Ton C EN NN: 


0 O 4 Ô [e 4 4 0 


Description ALIGNADDRESS adds two integer values, R[rs1] and R[rs2], and stores the result 
(with the least significant 3 bits forced to 0) in the integer register R[rd]. The least 
significant 3 bits of the result are stored in the GSR.align field. 


ALIGNADDRESS LITTLE is the same as ALIGNADDRESS except that the two's 
complement of the least significant 3 bits of the result is stored in GSR.align. 


Note | ALIGNADDRESS LITTLE generates the opposite-endian byte 
ordering for a subsequent FALIGNDATA operation. 


A byte-aligned 64-bit load can be performed as shown below. 





alignaddr Address, Offset, Address !set GSR.align 


lad [Address], %a0 
ldd [Address + 8], %d2 
faligndata %d0, %d2, %d4 luse GSR.align to select bytes 





If the floating-point unit is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if no 
FPU is present, an attempt to execute an ALIGNADDRESS or 
ALIGNADDRESS_LITTLE instruction causes an fp_disabled exception. 


Exceptions fo_disabled 


See Also Align Data on page 161 
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ALLCLEAN 





7.3 Mark All Register Window Sets “Clean” 





Instruction Operation Assembly Language Syntax Class 

ALLCLEAN? Mark all register window sets as “clean” allclean A1 
fon=ooo10 [ Tor To URL 
31 30 29 25 24 19 18 0 


Description The ALLCLEAN instruction marks all register window sets as “clean”; specifically, it 
performs the following operation: 


CLEANWIN < (N REG WINDOWS - 1) 


Programming | ALLCLEAN is used to indicate that all register windows are 
Note | "clean"; that is, do not contain data belonging to other address 
spaces. It is needed because the value of N REG WINDOWS is not 
known to privileged software. 


Exceptions illegal instruction (not implemented in hardware in UltraSPARC Architecture 2005) 
privileged opcode 


See Also INVALW on page 225 
NORMALW on page 274 
OTHERW on page 276 
RESTORED on page 294 
SAVED on page 302 
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AND, ANDN 





7.4 


Instruction 
AND 
ANDcc 
ANDN 
ANDNcc 


AND Logical Operation 


op3 Operation Assembly Language Syntax Class 
00 0001 and and Tégrg1, leg Or imm, Te&rg A1 
01 0001 and and modify cc's andcc  Teg;g,, reg or imm, Tegra A1 
00 0101 and not andn Tégrg1, leg Or imm, Tegyq A1 
01 0101 and not and modify cc's andncc Fegysir reg or imm, re&rg A1 


mI [GI 8H ———I—s 
= sian 


31 30 29 


25 24 19 18 14 13 12 5 4 0 


Description These instructions implement bitwise logical and operations. They compute “R[rs1] 


Exceptions 


op R[rs2]" if i= 0, or "R[rs1] op sign. ext (simm13)" if i = 1, and write the result into 
R[rd]. 


ANDcc and ANDNcc modify the integer condition codes (icc and xcc). They set the 
condition codes as follows: 


icc.v, icc.c, XCC.V, and xcc.c are set to 0 

icc.n is copied from bit 31 of the result 

xcc.n is copied from bit 63 of the result 

icc.z is set to 1 if bits 31:0 of the result are zero (otherwise to 0) 
xcc.z is set to 1 if all 64 bits of the result are zero (otherwise to 0) 


ANDN and ANDNcc logically negate their second operand before applying the 
main (and) operation. 


An attempt to execute an AND, ANDcc, ANDN or ANDNcc instruction when i = 0 
and reserved instruction bits 12:5 are nonzero causes an illegal_instruction exception. 


illegal instruction 
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ARRAY<8|16|32> 





7.5 Three-Dimensional Array Addressing 
[VIS 1] 








Instruction opf Operation Assembly Language Syntax Class 
ARRAY8 000010000 Convert 8-bit 3D address to blocked byte address array8  reg;g,, leSrsor Tegra C3 
ARRAY16 000010010 Convert 16-bit 3D address to blocked byte address array16 reg;si, leSrsor regra C3 
ARRAY32 000010100 Convert 32-bit 3D address to blocked byte address array32 reg;g,, leSrsor Tegra C3 


mono] st) I8 


31 30 29 25 24 19 18 14 13 5 4 0 


Description These instructions convert three-dimensional (3D) fixed-point addresses contained 
in R[rs1] to a blocked-byte address; they store the result in R[rd]. Fixed-point 
addresses typically are used for address interpolation for planar reformatting 
operations. Blocking is performed at the 64-byte level to maximize external cache 
block reuse, and at the 64-Kbyte level to maximize TLB entry reuse, regardless of the 
orientation of the address interpolation. These instructions specify an element size of 
8 bits (ARRAYS), 16 bits (ARRAY16), or 32 bits (ARRAY32). 


The second operand, R[rs2], specifies the power-of-2 size of the X and Y dimensions 
of a 3D image array. The legal values for R[rs2] and their meanings are shown in 
TABLE 7-4. Illegal values produce undefined results in the destination register, R[rd]. 


TABLE 7-4 3D R[rs2] Array X and Y Dimensions 
"R[rs2] Value (n) Number of Elements 
0 64 

128 

256 

512 

1024 

2048 


O1 & WN — 





Implementation | Architecturally, an illegal R[rs2] value (>5) causes the array 
Note | instructions to produce undefined results. For historic reference, 
past implementations of these instructions have ignored 
R[rs2]{63:3} and have treated R[rs2] values of 6 and 7 as if they 
were 5. 


The array instructions facilitate 3D texture mapping and volume rendering by 
computing a memory address for data lookup based on fixed-point x, y, and z 
coordinates. The data are laid out in a blocked fashion, so that points which are near 
one another have their data stored in nearby memory locations. 


138 UltraSPARC Architecture 2005 * Draft DO.9.2, 19 Jun 2008 


ARRAY<8|16|32> 


If the texture data were laid out in the obvious fashion (the z = 0 plane, followed by 
the z = 1 plane, etc.), then even small changes in z would result in references to 
distant pages in memory. The resulting lack of locality would tend to result in TLB 
misses and poor performance. The three versions of the array instruction, ARRAY8, 
ARRAY16, and ARRAY32, differ only in the scaling of the computed memory offsets. 
ARRAY16 shifts its result left by one position and ARRAY32 shifts left by two in 
order to handle 16- and 32-bit texture data. 


When using the array instructions, a “blocked-byte” data formatting structure is 
imposed. The N x NX M volume, where N = 2” x 64, M = m x 32,0 <n <5ć,1<m<16 
should be composed of 64 x 64 x 32 smaller volumes, which in turn should be 
composed of 4 x 4 x 2 volumes. This data structure is optimal for 16-bit data. For 16- 
bit data, the 4 x 4 x 2 volume has 64 bytes of data, which is ideal for reducing cache- 
line misses; the 64 x 64 x 32 volume will have 256 Kbytes of data, which is good for 
improving the TLB hit rate. FIGURE 7-1 illustrates how the data has to be organized, 
where the origin (0,0,0) is assumed to be at the lower-left front corner and the x 
coordinate varies faster than y than z. That is, when traversing the volume from the 
origin to the upper right back, you go from left to right, front to back, bottom to top. 





16x 2=32 
16x 4= 64 


0 4 16x 4-64 N = 2! x 64 


FIGURE 7-1 Blocked-Byte Data Formatting Structure 


The array instructions have 2 inputs: 
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ARRAY<8|16|32> 


The (x,y,z) coordinates are input via a single 64-bit integer organized in R[rs1] as 
shown in FIGURE 7-2. 








Z integer | Z fraction Y integer Y fraction] X integer X fraction 
63 55 54 44 43 33 32 22 21 11 10 0 

















FIGURE 7-2 Three-Dimensional Array Fixed-Point Address Format 


Note that z has only 9 integer bits, as opposed to 11 for x and y. Also note that since 
(x,y,z) are all contained in one 64-bit register, they can be incremented or 
decremented simultaneously with a single add or subtract instruction (ADD or 
SUB). 


So for a 512 x 512 x 32 or a 512 x 512 x 256 volume, the size value is 3. Note that the 
x and y size of the volume must be the same. The z size of the volume is a multiple 
of 32, ranging between 32 and 512. 


The array instructions generate an integer memory offset, that when added to the 
base address of the volume, gives the address of the volume element (voxel) and can 
be used by a load instruction. The offset is correct only if the data has been 
reformatted as specified above. 


The integer parts of x, y, and z are converted to the following blocked-address 
formats as shown in FIGURE 7-3 for ARRAY8, FIGURE 7-4 for ARRAY16, and FIGURE 7-5 
for ARRAY32. 




















UPPER MIDDLE LOWER 
Z | Y | X Z | Y | X Z Y X 
20 17 17 17 13 9 5 4 2 0 
+2n +2n +n 


FIGURE 7-3 Three-Dimensional Array Blocked-Address Format (ARRAYS) 





UPPER MIDDLE LOWER 





























21 18 18 18 14 10 6 5 3 1 0 
+2n +2n +n 


FIGURE 7-4 Three-Dimensional Array Blocked-Address Format (ARRAY16) 
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ARRAY<8|16|32> 





















































UPPER MIDDLE LOWER 
00 
Z Y x Z | Y x Z Y X 
22 19 19 19 15 11 Y 6 5 4 3 21 0 
+2n +2n +n 
FIGURE 7-5 Three Dimensional Array Blocked-Address Format (ARRAY32) 
The bits above Z upper are set to 0. The number of zeroes in the least significant bits 
is determined by the element size. An element size of 8 bits has no zeroes, an 
element size of 16 bits has one zero, and an element size of 32 bits has two zeroes. 
Bits in X and Y above the size specified by R[rs2] are ignored. 
TABLE 7-5 ARRAYS Description 
Result (R[rd]) Bits Source (R[rs1] Bits Field Information 
1:0 12:11 X_integer{1:0} 
3:2 34:33 Y_integer{1:0} 
4 55 Z_integer{0} 
8:5 16:13 X_integer{5:2} 
12:9 38:35 Y_integer{5:2} 
16:13 59:56 Z_integer{4:1} 
17+n-1:17 17+n-1:17 X_integer{6+n-1:6} 
17+2n-1:17+n 39+n-1:39 Y_integer{6+n-1:6} 
20+2n:17+2n 63:60 Z_integer{8:5} 
63:20+2n+1 n/a 0 
In the above description, if n = 0, there are 64 elements, so X_integer{6} and 
Y_integer{6} are not defined. That is, result{20:17} equals Z_integer{8:5}. 
Note | To maximize reuse of external cache and TLB data, software 
should block array references of a large image to the 64-Kbyte 
level. This means processing elements within a 32 x 32 x 64 
block. 
The code fragment below shows assembly of components along an interpolated line 
at the rate of one component per clock. 
add Addr, DeltaAddr, Addr 
array8 Addr, %g0, bAddr 
ldda [bAddr] #ASI_FL8_ PRIMARY, data 
faligndata data, accum, accum 
Exceptions None 
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Bicc 





7.6 Branch on Integer Condition Codes 
(Bicc) 





Assembly Language 








Opcode cond Operation icc Test Syntax Class 
BA 1000 Branch Always 1 ba{,a} label Al 
BN 0000 Branch Never 0 bn{,a} label Al 
BNE 1001 Branch on Not Equal not Z bne't,a) label A1 
BE 0001 Branch on Equal Z bet{,a} label A1 
BG 1010 Branch on Greater not (Zor (N xor V)) bg{,a} label Al 
BLE 0010 Branch on Less or Equal Z or (N xor V) ble{,a} label A1 
BGE 1011 Branch on Greater or Equal not (N xor V) bge{,a} label A1 
BL 0011 Branch on Less N xor V bl{,a} label A1 
BGU 1100 Branch on Greater Unsigned not (C or Z) bgu{,a} label A1 
BLEU 0100 Branch on Less or Equal Unsigned Cor Z bleut,a) label A1 
BCC 1101 Branch on Carry Clear (Greater Than not C bcc?(,a) label A1 
or Equal, Unsigned) 
BCS 0101 Branch on Carry Set (Less Than, Unsigned) C bcsV( ,a) label A1 
BPOS 1110 Branch on Positive not N bposí,a) label A1 
BNEG 0110 Branch on Negative N bneg{,a} label A1 
BVC 1111 Branch on Overflow Clear not V bve{,a} label A1 
BVS 0111 Branch on Overflow Set V bvs{,a} label A1 
$ synonym: bnz t synonym: bz 9 synonym: bgeu Y synonym: blu 
CETT e T T AE] 
31 30 29 28 25 24 22 21 0 


Programming | To set the annul (a) bit for Bicc instructions, append ^", a” to the 
Note | opcode mnemonic. For example, use "bgu,a label". In the 
preceding table, braces signify that the ", a” is optional. 


Unconditional branches and icc-conditional branches are described below: 


m Unconditional branches (BA, BN) — If its annul bit is 0 (a = 0), a BN (Branch 
Never) instruction is treated as a NOP. If its annul bit is 1 (a = 1), the following 
(delay) instruction is annulled (not executed). In neither case does a transfer of 
control take place. 
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Bicc 


BA (Branch Always) causes an unconditional PC-relative, delayed control transfer 
to the address “PC + (4 x sign ext (disp22) )". If the annul (a) bit of the branch 
instruction is 1, the delay instruction is annulled (not executed). If the annul bit is 
0 (a=0), the delay instruction is executed. 


m icc-conditional branches — Conditional Bicc instructions (all except BA and BN) 
evaluate the 32-bit integer condition codes (icc), according to the cond field of the 
instruction, producing either a TRUE or FALSE result. If TRUE, the branch is taken, 
that is, the instruction causes a PC-relative, delayed control transfer to the 
address “PC + (4 x sign ext (disp22))". If FALSE, the branch is not taken. 




















If a conditional branch is taken, the delay instruction is always executed 
regardless of the value of the annul field. If a conditional branch is not taken and 
the annul bit is 1 (a = 1), the delay instruction is annulled (not executed). 


Note | The annul bit has a different effect on conditional branches than 
it does on unconditional branches. 


Annulment, delay instructions, and delayed control transfers are described further 
in Chapter 6, Instruction Set Overview. 


Exceptions None 
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BMASK / BSHUFFLE 





7.7 Byte Mask and Shuffle 


Instruction opf Operation Assembly Language Syntax Class 


BMASK 0 00011001 Set the GSR.mask field in preparation bmask TES rstr lSrs2r lESrd C3 
for a subsequent BSHUFFLE instruction 


BSHUFFLE 001001100 Permute 16 bytes as specified by GSR.mask bshuffle fregys7, fregrs2r freggg C3 





ee a Se 1 


31 30 29 25 24 19 18 14 18 5 4 0 


Description BMASK adds two integer registers, R[rs1] and R[rs2], and stores the result in the 
integer register R[rd]. The least significant 32 bits of the result are stored in the 
GSR.mask field. 


BSHUFFLE concatenates the two 64-bit floating-point registers Fp[rs1] (more 
significant half) and Fp[rs2] (less significant half) to form a 128-bit (16-byte) value. 
Bytes in the concatenated value are numbered from most significant to least 
significant, with the most significant byte being byte 0. BSHUFFLE extracts 8 of 
those 16 bytes and stores the result in the 64-bit floating-point register Fp[rd]. Bytes 
in Fp[rd] are also numbered from most to least significant, with the most significant 
being byte 0. The following table indicates which source byte is extracted from the 
concatenated value to generate each byte in the destination register, Fp[rd]. 





Destination Byte (in F[rd) ^ Source Byte 
0 (most significant) (Fp[rs1] :: Fp[[rs2]){GSR.mask{31:28}} 
1 (Fp[[rs1] :: Fp[[rs2]){(GSR.mask{27:24}} 
2 (Fpl[rs1] :: Fp[[rs2]){GSR.mask{23:20}} 
3 [rs1] :: Fp[[rs2]){GSR.mask{19:16}} 
4 (Fpl[rs1] :: Fp[[rs2]){GSR.mask{15:12}} 
5 (Fpl[[rs1] :: Fp[[rs2]){GSR.mask{11:8}} 
6 (Fpl[rs1] :: Fp[[rs2]){GSR.mask{7:4}} 
{ 





7 (least significant)  (Fp[[rst] :: Fp[[rs2]){GSR.mask{3:0}} 


If the floating-point unit is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if no 
FPU is present, an attempt to execute a BMASK or BSHUFFLE instruction causes an 
fp disabled exception. 


Exceptions fp disabled 
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7.8 Branch on Integer Condition Codes with 
Prediction (BPcc) 











Instructioncond Operation cc Test Assembly Language Syntax Class 
BPA 1000 Branch Always 1 ba{,a}{,ptl,pn} i or x cc, label A1 
BPN 0000 Branch Never 0 bn{,a}{,ptl,pn} ior x cc, label A1 
BPNE 1001 Branch on Not Equal not Z bnet(,alf, pt! , pn] i or x cc, label A1 
BPE 0001 Branch on Equal Z bef{,a}{,ptl,pn} i or x cc, label A1 
BPG 1010 Branch on Greater not (Z or bg{,a}{,ptl,pn} ior x cc, label A1 
(N xor V)) 
BPLE 0010 Branch on Less or Equal Zor(N xor V) ble{,a}{,pt!,pn} i or x cc, label A1 
BPGE 1011 Branch on Greater or Equal not (N xor V) bge{,a}{,ptl,pn} i or x cc, label — A1 
BPL 0011 Branch on Less N xor V blf,alb,ptl,pn) ior x cc, label A1 
BPGU 1100 Branch on Greater Unsigned not (C or Z) bgu{,a}{,ptl,pn} i or x cc, label A1 
BPLEU 0100 Branch on Less or Equal Unsigned C or Z bleu{,a}{,ptl,pn} i or x cc, label A1 
BPCC 1101 Branch on Carry Clear not C becO{, aH, ptl, pn] i or x cc, label A1 
(Greater than or Equal, Unsigned) 
BPCS 0101 Branch on Carry Set C besV{, a}{, pt |, pn} i or x cc, label A1 
(Less than, Unsigned) 
BPPOS 1110 Branch on Positive not N bpos{, a}{, pt! , pn] i or x cc, label A1 
BPNEG 0110 Branch on Negative N bneg{, a}{, pt |, pn} i or x cc, label A1 
BPVC 1111 Branch on Overflow Clear not V bvc{,a}{,ptl,pn} i or x cc, label A1 
BPVS 0111 Branch on Overflow Set V bvs{,a}{,pt!,pn} i or x cc, label A1 
t synonym: bnz + synonym: bz 9 synonym: bgeu V synonym: blu 
Go Ta T cond | 007 2 [  — — — — — 99 — — ——] 
31 30 29 28 25 24 22 21 20 19 18 
cci cc0 Condition Code 
0 0 icc 
0 1 — 
1 0 XCC 
1 1 - 
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Programming | To set the annul (a) bit for BPcc instructions, append ^, a” to the 
Note | opcode mnemonic. For example, use bgu, a %icc, label. Braces in 

the preceding table signify that the ", a” is optional. To set the 

branch prediction bit, append to an opcode mnemonic either 

"^, pt" for predict taken or ", pn" for predict not taken. If neither 

^, pt" nor ", pn" is specified, the assembler defaults to ",pt". To 


Mo 


select the appropriate integer condition code, include "$icc" or 


Wo 


%xcc” before the label. 





Description Unconditional branches and conditional branches are described below. 


m Unconditional branches (BPA, BPN) — A BPN (Branch Never with Prediction) 
instruction for this branch type (op2 = 1) may be used in the SPARC V9 
architecture as an instruction prefetch; that is, the effective address (PC + (4 x 
sign_ext (disp19))) specifies an address of an instruction that is expected to be 
executed soon. If the Branch Never’s annul bit is 1 (a = 1), then the following 
(delay) instruction is annulled (not executed). If the annul bit is 0 (a = 0), then the 
following instruction is executed. In no case does a Branch Never cause a transfer 
of control to take place. 


BPA (Branch Always with Prediction) causes an unconditional PC-relative, 
delayed control transfer to the address “PC + (4 x sign_ext (disp19))”. If the annul 
bit of the branch instruction is 1 (a = 1), then the delay instruction is annulled (not 
executed). If the annul bit is 0 (a = 0), then the delay instruction is executed. 


m Conditional branches — Conditional BPcc instructions (except BPA and BPN) 
evaluate one of the two integer condition codes (icc or xcc), as selected by ccO 
and cc1, according to the cond field of the instruction, producing either a TRUE or 
FALSE result. If TRUE, the branch is taken; that is, the instruction causes a PC- 
relative, delayed control transfer to the address "PC + (4 xsign ext (disp19))". If 
FALSE, the branch is not taken. 




















If a conditional branch is taken, the delay instruction is always executed 
regardless of the value of the annul (a) bit. If a conditional branch is not taken 
and the annul bit is 1 (a = 1), the delay instruction is annulled (not executed). 


Note | The annul bit has a different effect on conditional branches than 
it does on unconditional branches. 


The predict bit (p) is used to give the hardware a hint about whether the branch is 
expected to be taken. A 1 in the p bit indicates that the branch is expected to be 
taken; a 0 indicates that the branch is expected not to be taken. 


Annulment, delay instructions, prediction, and delayed control transfers are 
described further in Chapter 6, Instruction Set Overview. 


An attempt to execute a BPcc instruction with cc0 = 1 (a reserved value) causes an 
illegal instruction exception. 


Exceptions illegal instruction 
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See Also Branch on Integer Register with Prediction (BPr) on page 148 
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7.9 Branch on Integer Register with 
Prediction (BPr) 





Register 
Contents 
Instruction rcond Operation Test Assembly Language Syntax Class 
— 000 Reserved = == 
BRZ 001 Branch on Register Zero Rfrs1]=0 brz{,al{,ptl,pn}  reg;ss, label A1 
BRLEZ 010 Branch on Register Less Than or Equal R[rs1] <0 brlez{,a}{,ptl,pn} regrsı; label A1 
to Zero 


BRLZ 011 Branch on Register Less Than Zero R[rs1]<0 briz{,a}{,ptl,pn} reg,ss, label A1 
— 100 Reserved — — 
BRNZ 101 Branch on Register Not Zero R[rs1] £0 brnz{,a}{,ptl,pn} reg;g,, label A1 
BRGZ 110 Branch on Register Greater Than Zero R[rsi] 0 brgz{,al{,ptl,pn} reg;g7, label — A1 


BRGEZ 111 Branch on Register Greater Than or R[rsí] 20 brgez{,a}{,ptl,pn} regrsı; label — A1 
Equal to Zero 





TT Lo prone] Te — —] 


31 30 29 28 27 25 24 22 21 20 19 18 14 13 0 
" Although SPARC V9 implementations should cause an illegal instruction exception when bit 28 = 1, many 
early implementations ignored the value of this bit and executed the opcode as a BPr instruction even if 
bit 28 = 1. 
Programming | To set the annul (a) bit for BPr instructions, append “, a” to the 
Note | opcode mnemonic. For example, use "brz,a $513, label.” In the 
preceding table, braces signify that the ", a" is optional. To set the 
branch prediction bit p, append either “, pt" for predict taken or 
"^ , pn" for predict not taken to the opcode mnemonic. If neither 
^ , pt" nor ", pn" is specified, the assembler defaults to ", pt". 


Description ^ These instructions branch based on the contents of R[rs1]. They treat the register 
contents as a signed integer value. 


A BPr instruction examines all 64 bits of R[rs1] according to the rcond field of the 
instruction, producing either a TRUE or FALSE result. If TRUE, the branch is taken; 
that is, the instruction causes a PC-relative, delayed control transfer to the address 
“PC + (4 x sign ext (d16hi :: d16lo))". If FALSE, the branch is not taken. 




















If the branch is taken, the delay instruction is always executed, regardless of the 
value of the annul (a) bit. If the branch is not taken and the annul bit is 1 (a = 1), the 
delay instruction is annulled (not executed). 
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Exceptions 


See Also 


BPr 


The predict bit (p) gives the hardware a hint about whether the branch is expected to 
be taken. If p = 1, the branch is expected to be taken; p = 0 indicates that the branch 
is expected not to be taken. 


An attempt to execute a BPr instruction when instruction bit 28 = 1 or rcond is a 
reserved value (0005 or 1005) causes an illegal instruction exception. 


Annulment, delay instructions, prediction, and delayed control transfers are 
described further in Chapter 6, Instruction Set Overview. 


Implementation | If this instruction is implemented by tagging each register value 
Note | with an N (negative) bit and Z (zero) bit, the table below can be 
used to determine if rcond is TRUE: 





Branch Test 

BRNZ not Z 

BRZ Z 

BRGEZ not N 

BRLZ N 

BRLEZ NorZ 
BRGZ not (N or Z) 


illegal instruction 


Branch on Integer Condition Codes with Prediction (BPcc) on page 145 
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7.10 Call and Link 


Instruction op Operation Assembly Language Syntax Class 


CALL 01 Call and Link call label A1 





31 30 29 0 


Description The CALL instruction causes an unconditional, delayed, PC-relative control transfer 
to address PC + (4 x sign ext(disp30)). Since the word displacement (disp30) field is 
30 bits wide, the target address lies within a range of —2?! to 42?! — 4 bytes. The PC- 
relative displacement is formed by sign-extending the 30-bit word displacement field 
to 62 bits and appending two low-order zeroes to obtain a 64-bit byte displacement. 


The CALL instruction also writes the value of PC, which contains the address of the 
CALL, into R[15] (out register 7). 


When PSTATE.am - 1, the more-significant 32 bits of the target instruction address 
are masked out (set to 0) before being sent to the memory system and in the address 
written into R[15]. (closed impl. dep. #125-V9-Cs10) 


Exceptions None 


See Also JMPL on page 226 
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CASA / CASXA 





7.11 


Instruction 


CASAL asi 


CASXAP ast 


Compare and Swap 


op3 Operation Assembly Language Syntax Class 

111100 Compare and Swap Word from casa [regrs4] imm asi, reg,s2, rega — A1 
Alternate Space casa [regs] $asi, regrgo, ler 

111110 Compare and Swap Extended from casxa  [reg;g;] imm asi, regyso, rega A1 
Alternate Space casxa  [L'eSrsil SASL, regrgo, regyq 


aps S TE mma T 8 


31 30 29 


Description 


rd 


op3 rs1 i=l — rs2 
25 24 19 18 14 13 12 5 4 0 


Concurrent processes use these instructions for synchronization and memory 
updates. Uses of compare-and-swap include spin-lock operations, updates of shared 
counters, and updates of linked-list pointers. The last two can use wait-free 
(nonlocking) protocols. 


The CASXA instruction compares the value in register R[rs2] with the doubleword 
in memory pointed to by the doubleword address in R[rs1]. If the values are equal, 
the value in R[rd] is swapped with the doubleword pointed to by the doubleword 
address in R[rs1]. If the values are not equal, the contents of the doubleword 
pointed to by R[rs1] replaces the value in R[rd], but the memory location remains 
unchanged. 


The CASA instruction compares the low-order 32 bits of register R[rs2] with a word 
in memory pointed to by the word address in R[rs1]. If the values are equal, then the 
low-order 32 bits of register R[rd] are swapped with the contents of the memory 
word pointed to by the address in R[rs1] and the high-order 32 bits of register R[rd] 
are set to 0. If the values are not equal, the memory location remains unchanged, but 
the contents of the memory word pointed to by R[rs1] replace the low-order 32 bits 
of R[rd] and the high-order 32 bits of register R[rd] are set to 0. 


A compare-and-swap instruction comprises three operations: a load, a compare, and 
a swap. The overall instruction is atomic; that is, no intervening interrupts or 
deferred traps are recognized by the virtual processor and no intervening update 
resulting from a compare-and-swap, swap, load, load-store unsigned byte, or store 
instruction to the doubleword containing the addressed location, or any portion of it, 
is performed by the memory system. 
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A compare-and-swap operation does not imply any memory barrier semantics. 
When compare-and-swap is used for synchronization, the same consideration 
should be given to memory barriers as if a load, store, or swap instruction were 
used. 


A compare-and-swap operation behaves as if it performs a store, either of a new 
value from R[rd] or of the previous value in memory. The addressed location must 
be writable, even if the values in memory and R[rs2] are not equal. 


If i = 0, the address space of the memory location is specified in the imm asi field; if 
i= 1, the address space is specified in the ASI register. 


An attempt to execute a CASXA or CASA instruction when i = 1 and instruction bits 
12:5 are nonzero causes an illegal instruction exception. 


A mem adaress not aligned exception is generated if the address in R[rs1] is not 
properly aligned. 


In nonprivileged mode (PSTATE.priv = 0), if bit 7 of the ASI is 0, CASXA and CASA 
cause a privileged action exception. In privileged mode (PSTATE.priv = 1), if the ASI 
is in the range 3046 to 7F45, CASXA and CASA cause a privileged action exception. 


Compatibility | An implementation might cause an exception because of an 
Note | error during the store memory access, even though there was no 
error during the load memory access. 


Programming | Compare and Swap (CAS) and Compare and Swap Extended 

Note | (CASX) synthetic instructions are available for “big endian” 
memory accesses. Compare and Swap Little (CASL) and Compare 
and Swap Extended Little (CASXL) synthetic instructions are 
available for "little endian" memory accesses. See Synthetic 
Instructions on page 536 for the syntax of these synthetic 
instructions. 





The compare-and-swap instructions do not affect the condition codes. 


The compare-and-swap instructions can be used with any of the following ASIs, 
subject to the privilege mode rules described for the privileged action exception 
above. Use of any other ASI with these instructions causes a data access exception 
exception. 


ASIs valid for CASA and CASXA instructions 
ASI, NUCLEUS ASI, NUCLEUS, LITTLE 
ASI, AS IF USER PRIMARY ASI, AS IF USER PRIMARY LITTLE 
ASI AS IF USER SECONDARY ASI AS IF USER SECONDARY LITTLE 














ASI REAL ASI REAL LITTLE 
ASI PRIMARY ASI PRIMARY LITTLE 
ASI SECONDARY ASI SECONDARY LITTLE 
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Exceptions illegal instruction 
mem address not aligned 
privileged action 
VA watchpoint 
data access exception 
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7.12 DONE 





Instruction  op3 Operation Assembly Language Syntax Class 

DONE? 111110 Return from Trap (skip trapped instruction) done Al 
CEA TO 8 
31 30 29 25 24 19 18 0 


Description The DONE instruction restores the saved state from TSTATE[TL] (GL, CCR, ASI, 
PSTATE, and CWP), sets PC and NPC, and decrements TL. DONE sets 
PC —TNPC[TL] and NPC<-TNPC[TL]+4 (normally, the value of NPC saved at the 
time of the original trap and address of the instruction immediately after the one 
referenced by the NPC). 


Programming | The DONE and RETRY instructions are used to return from 
Notes | privileged trap handlers. 


Unlike RETRY, DONE ignores the contents of TPC[TL]. 


If the saved TNPC[TL] was not altered by trap handler software, DONE causes 
execution to resume immediately after the instruction that originally caused the trap 
(as if that instruction was “done” executing). 


Execution of a DONE instruction in the delay slot of a control-transfer instruction 
produces undefined results. 


If software writes invalid or inconsistent state to TSTATE before executing DONE, 
virtual processor behavior during and after execution of the DONE instruction is 
undefined. 


When PSTATE.am = 1, the more-significant 32 bits of the target instruction address 
are masked out (set to 0) before being sent to the memory system. 


IMPL. DEP. #417-S10: If (1) TSTATE[TL].pstate.am = 1 and (2) a DONE instruction 
is executed (which sets PSTATE.am to ‘1’ by restoring the value from 
TSTATE[TL].pstate.am to PSTATE.am), it is implementation dependent whether the 
DONE instruction masks (zeroes) the more-significant 32 bits of the values it places 
into PC and NPC. 


Exceptions. In privileged mode (PSTATE.priv = 1), an attempt to execute DONE 
while TL = 0 causes an illegal instruction exception. An attempt to execute DONE 
(in any mode) with instruction bits 18:0 nonzero causes an illegal instruction 
exception. 
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In nonprivileged mode (PSTATE.priv = 0), an attempt to execute DONE causes a 
privileged_opcode exception. 


Implementation | In nonprivileged mode, illegal instruction exception due to TL = 0 
Note | does not occur. The privileged opcode exception occurs instead, 
regardless of the current trap level (TL). 


Exceptions illegal instruction 


privileged opcode 


See Also RETRY on page 296 
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7.13 Edge Handling Instructions 


Instruction opf Operation Assembly Language Syntax + Class 
EDGE8cc 000000000 Eight 8-bit edge boundary processing edge8cc Tegrg1, l'egrg2, Tega C3 


EDGE8Lcc 000000010 Eight 8-bit edge boundary processing, edge8lcc Tegrg1, l'egrg2, Tega C3 
little-endian 


EDGE16cc 000000100 Four 16-bit edge boundary processing edgel6cc Tegrg1, l'egrg2, Tega C3 


EDGE16Lcc 000000110 Four 16-bit edge boundary processing, edgel6lcc  regysir reg,, Tegra C3 
little-endian 


EDGE32cc 000001000 Two 32-bit edge boundary processing edge32cc Tegrg1, l'egrs2, Tega C3 
EDGE32Lcc 000001010 Two 32-bit edge boundary processing, edge32lcc  regrsir eSrs2r reg;g C3 
little-endian 


t The original assembly language mnemonics for these instructions did not include the “cc” suffix, as appears in the names of all other 
instructions that set the integer condition codes. The old, non-”cc” mnemonics are deprecated. Over time, assemblers will support 
the new mnemonics for these instructions. In the meantime, some older assemblers may recognize only the mnemonics, without “cc”. 


HOO [S LL 9B LÀ 


31 30 29 25 24 19 18 14 18 5 4 0 


Description These instructions handle the boundary conditions for parallel pixel scan line loops, 
where R[rs1] is the address of the next pixel to render and R[rs2] is the address of 
the last pixel in the scan line. 


EDGES8Lcc, EDGE16Lcc, and EDGE32Lcc are little-endian versions of EDGE8cc, 
EDGE16cc, and EDGE32cc. They produce an edge mask that is bit-reversed from 
their big-endian counterparts but are otherwise identical. This makes the mask 
consistent with the mask produced by the Partial Store instruction (see Partial Store 
on page 298) on little-endian data. 


A 2-bit (EDGE32cc), 4-bit (EDGE16cc), or 8-bit (EDGE8cc) pixel mask is stored in the 
least significant bits of R[rd]. The mask is computed from left and right edge masks 
as follows: 


1. The left edge mask is computed from the 3 least significant bits of R[rs1] and the 
right edge mask is computed from the 3 least significant bits of R[rs2], according 
to TABLE 7-6. 


2. If a 32-bit address masking is disabled (PSTATE.am = 0, 64-bit addressing) and 
the upper 61 bits of R[rs1] are equal to the corresponding bits in R[rs2], R[rd] is 
set to the right edge mask anded with the left edge mask. 
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3. If 32-bit address masking is enabled (PSTATE.am = 1, 32-bit addressing) and bits 
31:3 of R[rs1] match bits 31:3 of R[rs2], R[rd] is set to the right edge mask anded 
with the left edge mask. 


4. Otherwise, R[rd] is set to the left edge mask. 


The integer condition codes are set per the rules of the SUBcc instruction with the 
same operands (see Subtract on page 303). 


TABLE 7-6 lists edge mask specifications. 


TABLE 7-6 Edge Mask Specification 








Edge Rirsz] Big Endian Little Endian 
Size (2:0) Left Edge Right Edge Left Edge Right Edge 
8 000 1111 1111 1000 0000 1111 1111 0000 0001 
8 001 0111 1111 1100 0000 1111 1110 0000 0011 
8 010 0011 1111 1110 0000 1111 1100 0000 0111 
8 011 0001 1111 1111 0000 1111 1000 0000 1111 
8 100 0000 1111 1111 1000 1111 0000 0001 1111 
8 101 0000 0111 1111 1100 1110 0000 0011 1111 
8 110 0000 0011 1111 1110 1100 0000 0111 1111 
8 111 0000 0001 1111 1111 1000 0000 1111 1111 
16 00x 1111 1000 1111 0001 
16 01x 0111 1100 1110 0011 
16 10x 0011 1110 1100 0111 
16 11x 0001 1111 1000 1111 
32 Oxx 11 10 11 01 
32  1xx 01 11 10 11 
Exceptions illegal instruction 
See Also EDGE«8116132»[L]N on page 158 
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7.14 Edge Handling Instructions (no CC) 


Instruction opf Operation Assembly Language Syntax Class 
EDGE8N 000000001 Eight 8-bit edge boundary processing, no CC edge8n  regysir reg;go, Tegra C3 


EDGE8LN 000000011 Eight 8-bit edge boundary processing, edge8ln Tegrsir légrgo, Tegra C3 
little-endian, no CC 


EDGE16N 000000101 Four 16-bit edge boundary processing, no CCedgel6n = regrg4, 1eQrs2r Tegra C3 


EDGE16LN 0 0000 0111 Four 16-bit edge boundary processing, edgel6ln reg;str legrgo, Tegra C8 
little-endian, no CC 


EDGE32N 000001001 Two 32-bit edge boundary processing, no CC edge32n  reg;g;, legrs2r regra C3 


EDGE32LN 000001011 Two 32-bit edge boundary processing, edge321n regysir reg;gs2, Tegra C3 
little-endian, no CC 


HO [S EE USÀ 


31 30 29 25 24 19 18 14 18 5 4 0 


Description EDGES[L]N, EDGE16[L]N, and EDGE32[L]N operate identically to EDGE8[L]cc, 
EDGE16[L]cc, and EDGE32[L]cc, respectively, but do not set the integer condition 
codes. 


See Edge Handling Instructions on page 156 for details. 
Exceptions illegal_instruction 


See Also EDGE«8,16,32»[L]cc on page 156 
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7.15 


Floating-Point Absolute Value 


Instruction op3 opf Operation Assembly Language Syntax Class 
FABSs 110100 000001001 Absolute Value Single  fabss  fregrs2r ffegg M 
FABSd 11 0100 0 0000 1010 Absolute Value Double fabsd fregrs2r fregra A1 
FABSq 11 0100 0 0000 1011 Absolute Value Quad fabsq fregrs2r fregrd C3 


31 30 29 


Description 


Exceptions 


25 24 19 18 14 13 5 4 0 


FABS copies the source floating-point register(s) to the destination floating-point 
register(s), with the sign bit cleared (set to 0). 


FABSs operates on single-precision (32-bit) floating-point registers, FABSd operates on 
double-precision (64-bit) floating-point register pairs, and FABSq operates on quad- 
precision (128-bit) floating-point register quadruples. 


These instructions clear (set to 0) both FSR.cexc and FSR ftt. They do not round, do 
not modify FSR.aexc, and do not treat floating-point NaN values differently from 
other floating-point values. 


Note | UltraSPARC Architecture 2005 processors do not implement in 
hardware instructions that refer to quad-precision floating-point 
registers. An attempt to execute an FABSq instruction causes an 
illegal instruction exception, allowing privileged software to 
emulate the instruction. 


An attempt to execute an FABS instruction when instruction bits 18:14 are nonzero 
causes an illegal_instruction exception. 


If the FPU is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if no FPU is present, an 
attempt to execute an FABS instruction causes an fp disabled exception. 


illegal instruction 
fo disabled 
fp exception other (FSR.ftt = unimplemented FPop (FABSq)) 
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7.16 


Floating-Point Add 


Instruction op3 opf Operation Assembly Language Syntax Class 
FADDs 110100 001000000 ^ Add Single  Æfadds  fregrsir fregrsor frega M 
FADDd 11 0100 0 0100 0010 Add Double faddd fresrsir f'eg1g2, fregrd Al 
FADDq 11 0100 0 0100 0011 Add Quad faddq freSrsir freSrsor fregra C3 


31 30 29 


Description 


Exceptions 


9$ USES TUE 


19 18 14 18 5 4 0 


The floating-point add instructions add the floating-point register(s) specified by the 
rs1 field and the floating-point register(s) specified by the rs2 field. The instructions 
then write the sum into the floating-point register(s) specified by the rd field. 


Rounding is performed as specified by FSR.rd. 


Note | UltraSPARC Architecture 2005 processors do not implement in 
hardware instructions that refer to quad-precision floating-point 
registers. An attempt to execute a FADDq instruction causes an 
illegal instruction exception, allowing privileged software to 
emulate the instruction. 


If the FPU is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if no FPU is present, an 
attempt to execute an FADD instruction causes an fp disabled exception. 


If the FPU is enabled, FADDgq causes an fp exception other (with FSR.ftt = 
unimplemented, FPop), since that instruction is not implemented in hardware in 
UltraSPARC Architecture 2005 implementations. 


Note | An fp exception other with FSR.ftt = unfinished FPop can occur 
if the operation detects unusual, implementation-specific 
conditions. 


For more details regarding floating-point exceptions, see Chapter 8, IEEE Std 754- 
1985 Requirements for UltraSPARC Architecture 2005. 


illegal instruction 

fo disabled 

fp exception other (FSR.ftt = unimplemented FPop (FADDq)) 
fp exception other (FSR.ftt = unfinished FPop) 

fp exception ieee 754 (OF, UF, NX, NV) 
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7.17 | Align Data 


Instruction opf Operation Assembly Language Syntax Class 


FALIGNDATA 001001000 Perform data alignment for faligndata fregrst, freSrsor fregjg A1 
misaligned data 





3 Ton ES REN NN: 


O O À Q 8 À 4 Ü 


Description | FALIGNDATA concatenates the two 64-bit floating-point registers specified by rs1 
and rs2 to form a 128-bit (16-byte) intermediate value. The contents of the first 
source operand form the more-significant 8 bytes of the intermediate value, and the 
contents of the second source operand form the less significant 8 bytes of the 
intermediate value. Bytes in the intermediate value are numbered from most 
significant (byte 0) to least significant (byte 15). Eight bytes are extracted from the 
intermediate value and stored in the 64-bit floating-point destination register 
specified by rd. GSR.align specifies the number of the most significant byte to extract 
(and, therefore, the least significant byte extracted is numbered GSR.align+7). 


GSR.align is normally set by a previous ALIGNADDRESS instruction. 
GSR.align [101 








Fp[rs1] :: Fp[rs2] T 112 















































FIGURE 7-6 FALIGNDATA 


A byte-aligned 64-bit load can be performed as shown below. 


alignaddr Address, Offset, Address !set GSR.align 
ldd [Address], %d0 


ldd [Address + 8], 
faligndata %d0, %d2, %d4 luse GSR.align to select bytes 





If the FPU is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if no FPU is present, an 
attempt to execute an FALIGNDATA instruction causes an fp disabled exception. 


Exceptions fp disabled 
See Also Align Address on page 135 
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7.18 Branch on Floating-Point Condition 
Codes (FBfcc) 

















Opcode cond Operation fcc Test Assembly Language Syntax Class 
FBA? 1000 Branch Always 1. fba{,a} label M 
FBNP 0000 Branch Never 0 fbn{, a} label A1 
FBUP 0111 Branch on Unordered U fbu{, a} label A1 
FBG? 0110 Branch on Greater G fog{, a} label A1 
FBUG? 0101 Branch on Unordered or Greater G or U foug{,a} label A1 
FBL? 0100 Branch on Less L fbl{, a} label A1 
FBUL? 0011 Branch on Unordered or Less LorU fbul{,a} label A1 
FBLGP 0010 Branch on Less or Greater LorG fblg{,a} label A1 
FBNEP 0001 Branch on Not Equal L or Gor U Fbne'{,a} label A1 
FBE? 1001 Branch on Equal E tbei(,a) label A1 
FBUE? 1010 Branch on Unordered or Equal E or U fbue[,a] label A1 
FBGE? 1011 Branch on Greater or Equal EorG fbge{,a} label A1 
FBUGE? 1100 Branch on Unordered or Greater or Equal E or Gor U fbugel,a] label A1 
FBLEP 1101 Branch on Less or Equal EorL fble{,a} label A1 
FBULEP 1110 Branch on Unordered or Less or Equal E or L or U fbule(,a] label A1 
FBO? 1111 Branch on Ordered E or L'or G fbo{,a} label A1 
t synonym: £bnz t synonym: fbz 
LINEA ES 
31 30 29 28 25 24 22 21 0 


Programming | To set the annul (a) bit for FBfcc instructions, append “, a” to 
Note | the opcode mnemonic. For example, use "£b1,a label". In the 
preceding table, braces around “, a” signify that “, a” is 
optional. 


Description Unconditional and Fcc branches are described below: 


m Unconditional branches (FBA, FBN) — If its annul field is 0, an FBN (Branch 
Never) instruction acts like a NOP. If its annul field is 1, the following (delay) 
instruction is annulled (not executed) when the FBN is executed. In neither case 
does a transfer of control take place. 
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FBA (Branch Always) causes a PC-relative, delayed control transfer to the address 
“PC + (4 x sign ext (disp22))" regardless of the value of the floating-point 
condition code bits. If the annul field of the branch instruction is 1, the delay 
instruction is annulled (not executed). If the annul (a) bit is 0, the delay 
instruction is executed. 


m Fcc-conditional branches — Conditional FBfcc instructions (except FBA and 
FBN) evaluate floating-point condition code zero (fcc0) according to the cond 
field of the instruction. Such evaluation produces either a TRUE or FALSE result. 
If TRUE, the branch is taken, that is, the instruction causes a PC-relative, delayed 
control transfer to the address “PC + (4 x sign ext(disp22))". If FALSE, the branch 
is not taken. 




















If a conditional branch is taken, the delay instruction is always executed, 
regardless of the value of the annul (a) bit. If a conditional branch is not taken 
and the annul bit is 1 (a = 1), the delay instruction is annulled (not executed). 


Note | The annul bit has a different effect on conditional branches than 
it does on unconditional branches. 


Annulment, delay instructions, and delayed control transfers are described 
further in Chapter 6. 


If the FPU is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if no FPU is present, an 
attempt to execute an FBfcc instruction causes an fp disabled exception. 


Exceptions fp disabled 
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7.19 


Branch on Floating-Point Condition 
Codes with Prediction (FBPfcc) 

















Instruction cond Operation fcc Test Assembly Language Syntax Class 
FBPA 1000 Branch Always 1 fba{,a}{,ptl,pn} $£ccn, label A1 
FBPN 0000 Branch Never 0 fbní,al,ptl,pn] %£ccn, label A1 
FBPU 0111 Branch on Unordered U fbuí,alM,ptl,pn] %£ccn, label A1 
FBPG 0110 Branch on Greater G fbg{,a}{,ptl,pn} $£ccn, label A1 
FBPUG 0101 Branch on Unordered or Greater G or U foug{,a}{,ptl,pn}  $fcenm, label A1 
FBPL 0100 Branch on Less L fbl{,a}H, ptl, pn} $£ccn, label A1 
FBPUL 0011 Branch on Unordered or Less Lor U fbul{,a}{,ptl,pn} %fccn, label A1 
FBPLG 0010 Branch on Less or Greater LorG fblgl,al(,pt!,pn] %fccn, label A1 
FBPNE 0001 Branch on Not Equal LorGorU  £bne'(,a], ptl,pn} %fccn, label A1 
FBPE 1001 Branch on Equal E £bet(, aH, pt |, pn] %£ccn, label A1 
FBPUE 1010 Branch on Unordered or Equal E or U fbue{, aH, ptl, pn} %fccn, label A1 
FBPGE 1011 Branch on Greater or Equal E or G fbge{, aH, ptl, pn} %fccn, label A1 
FBPUGE 1100 Branch on Unordered or Greater EorGorU fbuge{, a}, ptl, pn} %fccn, label A1 
or Equal 
FBPLE 1101 Branch on Less or Equal EorL fble{, aH, ptl, pn} %fccn, label A1 
FBPULE 1110 Branch on Unordered or Lessor EorLorU fbule{, a}, ptl, pn} %fccn, label A1 
Equal 
FBPO 1111 Branch on Ordered EorLorG fbo{,aH,ptl, pn} $£ccn, label A1 





t synonym: fbnz 


f synonym: £bz 


o o o 2 EE 


31 30 29 28 





25 24 22 21 20 19 18 
cci cc0 Condition Code 
0 0 feco 

0 1 feci 

1 0 fcc2 

1 1 £e63 
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Description 


Exceptions 


FBPfcc 


Programming | To set the annul (a) bit for FBPfcc instructions, append “, a” to the 
Note | opcode mnemonic. For example, use "£b1,a %fcc3, label". In 

the preceding table, braces signify that the ", a” is optional. To set 
the branch prediction bit, append either ", pt" (for predict taken) 
or “pn” (for predict not taken) to the opcode mnemonic. If neither 
^, pt" nor “, pn” is specified, the assembler defaults to ", pt". To 
select the appropriate floating-point condition code, include 
“feco”, "Sfccl", “Sfcc2”, or "Sfcc3" before the label. 





Unconditional branches and Fcc-conditional branches are described below. 


m Unconditional branches (FBPA, FBPN) — If its annul field is 0, an FBPN 
(Floating-Point Branch Never with Prediction) instruction acts like a NOP. If the 
Branch Never’s annul field is 0, the following (delay) instruction is executed; if 
the annul (a) bit is 1, the following instruction is annulled (not executed). In no 
case does an FBPN cause a transfer of control to take place. 


FBPA (Floating-Point Branch Always with Prediction) causes an unconditional 
PC-relative, delayed control transfer to the address 

“PC + (4x sign ext (disp19))". If the annul field of the branch instruction is 1, the 
delay instruction is annulled (not executed). If the annul (a) bit is 0, the delay 
instruction is executed. 


m Fcc-conditional branches — Conditional FBPfcc instructions (except FBPA and 
FBPN) evaluate one of the four floating-point condition codes (£cc0, £cc1, £cc2, 
£cc3) as selected by cc0 and cc1, according to the cond field of the instruction, 
producing either a TRUE or FALSE result. If TRUE, the branch is taken, that is, the 
instruction causes a PC-relative, delayed control transfer to the address 
“PC + (4x sign ext (disp19))". If FALSE, the branch is not taken. 




















If a conditional branch is taken, the delay instruction is always executed, 
regardless of the value of the annul (a) bit. If a conditional branch is not taken 
and the annul bit is 1 (a = 1), the delay instruction is annulled (not executed). 


Note | The annul bit has a different effect on conditional branches than it 
does on unconditional branches. 


The predict bit (p) gives the hardware a hint about whether the branch is expected 
to be taken. A 1 in the p bit indicates that the branch is expected to be taken. A 0 
indicates that the branch is expected not to be taken. 


Annulment, delay instructions, and delayed control transfers are described further 
in Chapter 6, Instruction Set Overview. 


If the FPU is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if no FPU is present, an 
attempt to execute an FBPfcc instruction causes an fp disabled exception. 


fo disabled 
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7.20 SIMD Signed Compare 








Instruction opf Operation si s2 d Assembly Language Syntax Class 

FCMPLE16 00010 0000 Four 16-bit compare; f64 f64 i64 femplelé freg;s, fregrso, reg rq C3 
set R[rd] if src1 < src2 

FCMPNE16 0 0010 0010 Four 16-bit compare; f64 f64 i64 fcmpnelé freg;s, fregrgo, TES rq C3 
set R[rd] if src1 + src2 

FCMPLE32 0 0010 0100 Two 32-bit compare; f64 f64 i64 fcmple32 fregrst, fregrgo, lESrd C3 
set R[rd] if src1 < src2 

FCMPNE32 0 0010 0110 Two 32-bit compare; f64 f64 i64 fcmpne32 freg;s, freSrs2 TE rq C3 
set R[rd] if src1 # src2 

FCMPGT16 0 0010 1000 Four 16-bit compare; f64 f64 i64 fempgt16 freg;s, fregrso, reg rq C3 
set R[rd] if src1 > src2 

FCMPEQ16 00010 1010 Four 16-bit compare; f64 f64 i64 fempeqlé freg;s, fregrso, regrq C3 
set R[rd] if src1 = src2 

FCMPGT32 00010 1100 Two 32-bit compare; f64 f64 i64 fcmpgt32 freg;s, fregrso, Tegrq C3 
set R[rd] if src1 » src2 

FCMPEQ32 00010 1110 Two 32-bit compare; f64 f64 i64 fcmpeq32 freg;s, fregrso, Tegrq C3 


set R[rd] if src1 = src2 


rd 110110 rst opt rs2 


31 30 29 25 24 19 18 14 13 5 4 0 


Description Either four 16-bit signed values or two 32-bit signed values in Fp[rs1] and Fp[rs2] 
are compared. The 4-bit or 2-bit condition-code results are stored in the least 
significant bits of the integer register R[rd]. The least significant 16-bit or 32-bit 
compare result corresponds to bit zero of R[rd]. 


Note | Bits 63:4 of the destination register R[rd] are set to zero for 16-bit 
compares. Bits 63:2 of the destination register R[rd] are set to 
zero for 32-bit compares. 


For FCMPGT{16,32}, each bit in the result is set to 1 if the corresponding signed 
value in Fp[rs1] is greater than the signed value in Fp[rs2]. Less-than comparisons 
are made by swapping the operands. 


For FCMPLE{16,32}, each bit in the result is set to 1 if the corresponding signed value 
in Fp[rs1] is less than or equal to the signed value in Fp[rs2]. Greater-than-or-equal 
comparisons are made by swapping the operands. 


For FCMPEQ{16,32}, each bit in the result is set to 1 if the corresponding signed 
value in Fp[rs1] is equal to the signed value in Fp[rs2]. 
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For FCMPNE(16,32], each bit in the result is set to 1 if the corresponding signed 
value in Fp[rs1] is not equal to the signed value in Fp[rs2]. 


FIGURE 7-7 and FIGURE 7-8 illustrate 16-bit and 32-bit pixel comparison operations, 
respectively. 


Fp[rs1] 
63 48 47 32 31 16 15 0 
fcmp[gt, le, eq, ne, It, ge]16 
Fp[rs2] 
R[rd] 








FIGURE 7-8 Two 32-bit Signed Fixed-point SIMD Comparison Operation 


In all comparisons, if a compare condition is not true, the corresponding bit in the 
result is set to 0. 


Programming | The results of a SIMD signed compare operation can be used 
Note | directly by both integer operations (for example, partial stores) 
and partitioned conditional moves. 


If the FPU is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if no FPU is present, an 
attempt to execute a SIMD signed compare instruction causes an fp disabled 
exception. 
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Exception fo_ disabled 


See Also STPARTIALF on page 329 
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7.21 Floating-Point Compare 











Instruction opf Operation Assembly Language Syntax Class 

FCMPs 001010001 Compare Single Ecmps $fccn, freg;g1, fregrs2 Al 

FCMPd 001010010 Compare Double Fcmpd %fccn, fregrsir fregrso Al 

FCMPq 001010011 Compare Quad fcmpq %fccn, fregrsir fregrso C3 

FCMPEs 001010101 Compare Single and Exception if fcmpes  $fcon, fregrsir fregrso Al 
Unordered 

FCMPEd 001010110 Compare Double and Exception if fcmped %fccn, fregrsir fregrso Al 
Unordered 

FCMPEq 001010111 Compare Quad and Exception if fcmpeq %fccn, freg;s1, fregrso C3 
Unordered 





cc 11 0101 opf rs2 
EANES E 


31 30 29 27 26 25 24 14 13 5 4 0 
cc cc0 Condition Code 
0 0 feco 
0 1 fccl 
1 0 fGc2 
1 1 fcc3 


Description These instructions compare F[rs1] with F[rs2] , and set the selected floating-point 
condition code (fccn) as follows 


Relation = = ^ Resulting fcc value 
fregrs1 = fregyso 0 
Jregrs1 < feSrs2 1 
Jregrs1 > feSrs2 2 
fregrs1 ? fregrs2 (unordered) 3 


The “?” in the preceding table means that the compared values are unordered. The 
unordered condition occurs when one or both of the operands to the comparison is a 
signalling or quiet NaN 


The "compare and cause exception if unordered” (FCMPEs, FCMPEd, and FCMPEq) 
instructions cause an invalid (NV) exception if either operand is a NaN. 
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FCMP causes an invalid (NV) exception if either operand is a signalling NaN. 


V8 Compatibility | Unlike the SPARC V8 architecture, SPARC V9 and the 
Note | UltraSPARC Architecture do not require an instruction between a 
floating-point compare operation and a floating-point branch 
(FBfcc, FBPfcc). 


SPARC V8 floating-point compare instructions are required to 
have rd = 0. In SPARC V9 and the UltraSPARC Architecture, bits 
26 and 25 of the instruction (rd{1:0}) specify the floating-point 
condition code to be set. Legal SPARC V8 code will work on 
SPARC V9 and the UltraSPARC Architecture because the zeroes 
in the R[rd] field are interpreted as fcc0 and the FBfcc 
instruction branches based on the value of feco. 


An attempt to execute an FCMP instruction when instruction bits 29:27 are nonzero 
causes an /llegal instruction exception. 


Note | UltraSPARC Architecture 2005 processors do not implement in 
hardware the instructions that refer to quad-precision floating- 
point registers. An attempt to execute FCMPq or FCMPEq 
generates fp exception other (with 
FSRftt = unimplemented FPop), which causes a trap, allowing 
privileged software to emulate the instruction. 


If the FPU is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if no FPU is present, an 
attempt to execute an FCMP or FCMPE instruction causes an fp disabled exception. 


For more details regarding floating-point exceptions, see Chapter 8, IEEE Std 754- 
1985 Requirements for UltraSPARC Architecture 2005. 


illegal instruction 

fo disabled 

fp exception ieee 754 (NV) 

fp exception other (FSR.ftt = unimplemented FPop (FCMPq, FCMPEq only)) 
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1,22 


Floating-Point Divide 


Instruction op3 opf Operation Assembly Language Syntax Class 
FDIVs 110100 001001101 Divide Single  Æfdivs fregrsir fregrs2r fgg A1 
FDIVd 11 0100 00100 1110 Divide Double fdivd fregrsir freSrsor fregrd Al 
FDIVq 11 0100 001001111 Divide Quad fdivq fregrst, freSrsa, fregrd C3 


31 30 29 


Description 


Exceptions 


25 24 19 18 14 13 5 4 0 


The floating-point divide instructions divide the contents of the floating-point 
register(s) specified by the rs1 field by the contents of the floating-point register(s) 
specified by the rs2 field. The instructions then write the quotient into the floating- 
point register(s) specified by the rd field. 


Rounding is performed as specified by FSR.rd. 


If the FPU is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if no FPU is present, an 
attempt to execute an FCMP or FCMPE instruction causes an fp disabled exception. 


If the FPU is enabled, FDIVq causes an fp exception other (with FSR.ftt = 
unimplemented FPop), since that instruction is not implemented in hardware in 
UltraSPARC Architecture 2005 implementations. 


Note | For FDIVs and FDIVd, an fp exception other with 
FSR.ftt = unfinished FPop can occur if the divide unit detects 
unusual, implementation-specific conditions. 


For more details regarding floating-point exceptions, see Chapter 8, IEEE Std 754- 
1985 Requirements for UltraSPARC Architecture 2005. 


illegal instruction 

fo disabled 

fp exception other (FSR.ftt = unimplemented FPop (FDIVq only)) 
fp exception other (FSR.ftt = unfinished. FPop (FDIVs, FDIV)) 

fp exception ieee 754 (OF, UF, DZ, NV, NX) 
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723  FEXPAND 





Instruction opf Operation si s2 d Assembly Language Syntax Class 
FEXPAND 001001101 Four 16-bit expands — £132 f64 fexpand fregrso, fregrg C3 
oy w Hor — [e T 
31 30 29 25 24 19 18 14 18 5 4 0 


Description | FEXPAND takes four 8-bit unsigned integers from Fg[rs2], converts each integer to a 
16-bit fixed-point value, and stores the four resulting 16-bit values in a 64-bit 
floating-point register Fp[rd]. FIGURE 7-10 illustrates the operation. 






































31 87 0 
Fplrd] | 0000 | 77 0000 | 0000 0000 | 
63 60 59 5251 48 47 44 43 36 35 32 31 28 27 2019 1615 12 11 43 0 


FIGURE 7-9 FEXPAND Operation 


This operation is carried out as follows: 
1. Left-shift each 8-bit value by 4 and zero-extend each result to a 16-bit fixed value. 


2. Store the result in the destination register, Fp[rd]. 


Programming | FEXPAND performs the inverse of the FPACK16 operation. 
Note 


In an UltraSPARC Architecture 2005 implementation, this instruction is not 
implemented in hardware, causes an illegal_instruction exception, and is emulated in 
software. 


An attempt to execute an FEXPAND instruction when instruction bits 18:14 are 
nonzero causes an illegal instruction exception. 


Exceptions illegal instruction 


See Also FPMERGE on page 206 
FPACK on page 197 
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7.24 Convert 32-bit Integer to Floating Point 


Assembly Language 





Instruction op3 opf Operation s1 s2 d Syntax Class 

FiTOs 110100 011000100 Convert 32-bit Integer to  — f32 (32 fitos freg;so, fregrg A1 
Single 

FiTOd 110100 011001000 Convert 32-bit Integer to  — f32 f64 fitod freg;so, fregra A1 
Double 

FiTOq 110100 011001100 Convert 32-bit Integer to  — . f32 f128 fitog freg;so, fregrg C3 
Quad 

oy] P3 e 
31 30 29 25 24 19 18 14 13 5 4 0 


Description FiTOs, FiTOd, and FiTOq convert the 32-bit signed integer operand in floating-point 
register Fs[rs2] into a floating-point number in the destination format. All write 
their result into the floating-point register(s) specified by rd. 


The value of FSR.rd determines how rounding is performed by FiTOs. 


Note | UltraSPARC Architecture 2005 processors do not implement in 
hardware instructions that refer to quad-precision floating-point 
registers. An attempt to execute a FiTOq instruction causes an 
illegal instruction exception, allowing privileged software to 
emulate the instruction. 


An attempt to execute an FiTO«s |d |q> instruction when instruction bits 18:14 are 
nonzero causes an illegal instruction exception. 


If the FPU is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if no FPU is present, an 
attempt to execute an FiTO«sdlq» instruction causes an fp disabled exception. 


If the FPU is enabled, FiTOq causes an fp exception other (with FSR.ftt — 
unimplemented_FPop), since that instruction is not implemented in hardware in 
UltraSPARC Architecture 2005 implementations. 


For more details regarding floating-point exceptions, see Chapter 8, IEEE Std 754- 
1985 Requirements for UltraSPARC Architecture 2005. 


Exceptions illegal instruction 
fo disabled 
fp exception other (FSR.ftt = unimplemented FPop (FiTOq)) 
fp exception ieee 754 (NX (FiTOs only)) 
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7.25 | Flush Instruction Memory 


Instruction op3 Operation Assembly Language Syntaxt Class 


FLUSH 11 1011 Flush Instruction Memory flush [address] A1 





t The original assembly language syntax for a FLUSH instruction (“flush address") has been deprecated be- 
cause of inconsistency with other SPARC assembly language syntax. Over time, assemblers will support the 
new syntax for this instruction. In the meantime, some existing assemblers may only recognize the original syn- 
tax. 


ATI IH ———I—S 
AI — T ow 8 HL. smi — —] 


31 30 29 25 24 19 18 14 13 12 5 4 0 


Description FLUSH ensures that the aligned doubleword specified by the effective address is 
consistent across any local caches and, in a multiprocessor system, will eventually 
(impl. dep. #122-V9) become consistent everywhere. 


The SPARC V9 instruction set architecture does not guarantee consistency between 
instruction memory and data memory. When software writes! to a memory location 
that may be executed as an instruction (self-modifying code?), a potential memory 
consistency problem arises, which is addressed by the FLUSH instruction. Use of 
FLUSH after instruction memory has been modified ensures that instruction and 
data memory are synchronized for the processor that issues the FLUSH instruction. 


The virtual processor waits until all previous (cacheable) stores have completed 
before issuing a FLUSH instruction. For the purpose of memory ordering, a FLUSH 
instruction behaves like a store instruction. 


In the following discussion PpLusų refers to the virtual processor that executed the 
FLUSH instruction. 


FLUSH causes a synchronization within a virtual processor which ensures that 
instruction fetches from the specified effective address by Pry yag appear to execute 
after any loads, stores, and atomic load-stores to that address issued by Pgyysy prior 
to the FLUSH. In a multiprocessor system, FLUSH also ensures that these values will 
eventually become visible to the instruction fetches of all other virtual processors in 
the system. With respect to MEMBAR-induced orderings, FLUSH behaves as if it is a 
store operation (see Memory Barrier on page 260). 


1. this includes use of store instructions (executed on the same or another virtual processor) that write to 
instruction memory, or any other means of writing into instruction memory (for example, DMA transfer) 


2. practiced, for example, by software such as debuggers and dynamic linkers 
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If i = 0, the effective address operand for the FLUSH instruction is “R[rs1] + R[rs2]"; 
ifi = 1, it is “R[rs1] + sign ext (simm13)". The three least-significant bits of the 
effective address are ignored; that is, the effective address always refers to an 
aligned doubleword. 


See implementation-specific documentation for details on specific implementations 
of the FLUSH instruction. 


On an UltraSPARC Architecture processor: 


m A FLUSH instruction causes a synchronization within the virtual processor on 
which the FLUSH is executed, which flushes its instruction pipeline to ensure that 
no instruction already fetched has subsequently been modified in memory. Any 
other virtual processors on the same physical processor are unaffected by a 
FLUSH. 


m Coherency between instruction and data memories may or may not be 
maintained by hardware. 


IMPL. DEP. #409-S10-Cs20: The implementation of the FLUSH instruction is 
implementation dependent. If the implementation automatically maintains 
consistency between instruction and data memory, 
(1) the FLUSH address is ignored and 
(2) the FLUSH instruction cannot cause any data access exceptions, because 

its effective address operand is not translated or used by the MMU. 
On the other hand, if the implementation does not maintain consistency between 
instruction and data memory, the FLUSH address is used to access the MMU and the 
FLUSH instruction can cause data access exceptions. 


Programming | For portability across all SPARC V9 implementations, software 
Note | must always supply the target effective address in FLUSH 
instructions. 
m If the implementation contains instruction prefetch buffers: 
a the instruction prefetch buffer(s) are invalidated 


a instruction prefetching is suspended, but may resume starting with the 
instruction immediately following the FLUSH 


Programming | 1.Typically, FLUSH is used in self-modifying code. 
Notes | The use of self-modifying code is discouraged. 


2. If a program includes self-modifying code, to be portable it must 
issue a FLUSH instruction for each modified doubleword of 
instructions (or make a call to privileged software that has an 
equivalent effect) after storing into the instruction stream. 
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3. The order in which memory is modified can be controlled by 
means of FLUSH and MEMBAR instructions interspersed 
appropriately between stores and atomic load-stores. FLUSH is 
needed only between a store and a subsequent instruction fetch 
from the modified location. When multiple processes may 
concurrently modify live (that is, potentially executing) code, the 
programmer must ensure that the order of update maintains the 
program in a semantically correct form at all times. 


4. The memory model guarantees in a uniprocessor that data loads 
observe the results of the most recent store, even if there is no 
intervening FLUSH. 


5. FLUSH may be a time-consuming operation. 
(see the Implementation Note below) 


6. In a multiprocessor system, the effects of a FLUSH operation 
will be globally visible before any subsequent store becomes 
globally visible. 


7. FLUSH is designed to act on a doubleword. On some 
implementations, FLUSH may trap to system software. For these 
reasons, system software should provide a service routine, 
callable by nonprivileged software, for flushing arbitrarily-sized 
regions of memory. On some implementations, this routine 
would issue a series of FLUSH instructions; on others, it might 
issue a single trap to system software that would then flush the 
entire region. 


8. FLUSH operates using the current (implicit) context. Therefore, 
a FLUSH executed in privileged mode will use the nucleus 
context and will not necessarily affect instruction cache lines 
containing data from a user (nonprivileged) context. 


Implementation | In a multiprocessor configuration, FLUSH requires all processors 
Note | that may be referencing the addressed doubleword to flush their 
instruction caches, which is a potentially disruptive activity. 


V9 Compatibility | The effect of a FLUSH instruction as observed from the virtual 
Note | processor on which FLUSH executes is immediate. Other virtual 
processors in a multiprocessor system eventually will see the 
effect of the FLUSH, but the latency is implementation dependent. 


An attempt to execute a FLUSH instruction when instruction bits 29:25 are nonzero 
causes an illegal_instruction exception. 


Exceptions illegal instruction 
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7.26 Flush Register Windows 


Instruction op3 Operation Assembly Language Syntax Class 


FLUSHW 101011 Flush Register Windows flushw Al 





IT I EE = E 


31 30 29 25 24 19 18 14 13 12 0 


Description  FLUSHW causes all active register windows except the current window to be 
flushed to memory at locations determined by privileged software. FLUSHW 
behaves as a NOP if there are no active windows other than the current window. At 
the completion of the FLUSHW instruction, the only active register window is the 
current one. 


Programming 
Note 


The FLUSHW instruction can be used by application software to 
flush register windows to memory so that it can switch memory 
stacks or examine register contents from previous stack frames. 





FLUSHW acts as a NOP if CANSAVE = N REG WINDOWS — 2. Otherwise, there is 
more than one active window, so FLUSHW causes a spill exception. The trap vector 
for the spill exception is based on the contents of OTHERWIN and WSTATE. The spill 
trap handler is invoked with the CWP set to the window to be spilled (that is, 
(CWP + CANSAVE + 2) mod N_REG_WINDOWS). See Register Window Management 
Instructions on page 116. 


Programming | Typically, the spill handler saves a window on a memory stack 
Note | and returns to reexecute the FLUSHW instruction. Thus, FLUSHW 
traps and reexecutes until all active windows other than the 
current window have been spilled. 


An attempt to execute a FLUSHW instruction when instruction bits 29:25, 18:14, or 


12:0 are nonzero causes an illegal instruction exception. 


Exceptions illegal instruction 
spill n normal 
spill n other 


CHAPTER 7 * Instructions 177 


FMOV 





7.27 Floating-Point Move 


Instruction op3 opf Operation Assembly Language Syntax Class 
FMOVs 11 0100 0 0000 0001 Move (copy) Single fmovs fresrsor fregra Al 
FMOVd 11 0100 0 0000 0010 Move (copy) Double fmovd fresrsor fregra Al 
FMOVq 11 0100 0 0000 0011 Move (copy) Quad fmovq fregis2, fregra C3 





31 30 29 25 24 19 18 14 18 5 4 0 


Description FMOV copies the source floating-point register(s) to the destination floating-point 
register(s), unaltered. 


FMOVs, FMOVd, and FMOVq perform 32-bit, 64-bit, and 128-bit operations, 
respectively. 


These instructions clear (set to 0) both FSR.cexc and FSR ftt. They do not round, do 
not modify FSR.aexc, and do not treat floating-point NaN values differently from 
other floating-point values. 


Note | UltraSPARC Architecture 2005 processors do not implement in 
hardware instructions that refer to quad-precision floating-point 
registers. An attempt to execute an FMOVq instruction causes an 
illegal instruction exception, allowing privileged software to 
emulate the instruction. 


An attempt to execute an FMOV instruction when instruction bits 18:14 are nonzero 
causes an /llegal instruction exception. 


If the FPU is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if no FPU is present, an 
attempt to execute an FMOV instruction causes an fp disabled exception. 


If the FPU is enabled, an attempt to execute an FMOVgq instruction causes an 
fp exception other (with FSR.ftt = unimplemented  FPop), since that instruction is 
not implemented in hardware in UltraSPARC Architecture 2005 implementations. 


For more details regarding floating-point exceptions, see Chapter 8, IEEE Std 754- 
1985 Requirements for UltraSPARC Architecture 2005. 


Exceptions illegal instruction 
fo disabled 
fp exception other (FSR.ftt = unimplemented_FPop (FMOVq only)) 
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FMOV 


See Also F Register Logical Operate (2 operand) on page 212 


CHAPTER 7 * Instructions 179 


FMOVcc 





7.28 Move Floating-Point Register on 
Condition (FMOVcc) 





Instruction opf_low Operation Assembly Language Syntax Class 

FMOVSicc 00 0001 Move Floating-Point Single, fmovsicc %icc, fregrs2r freg rq A1 
based on 32-bit integer condition codes 

FMOVDicc 00 0010 Move Floating-Point Double, fmovdicc %icc, fregrso, fregrg Al 
based on 32-bit integer condition codes 

FMOVQicc 000011 Move Floating-Point Quad, fmovaicc $icc, fregrgo, freg iq C3 


based on 32-bit integer condition codes 


FMOVSxcc 000001 Move Floating-Point Single, fmovsxcc $xcc, fregrs2r freg rq A1 
based on 64-bit integer condition codes 

FMOVDxcc 000010 Move Floating-Point Double, fmovdxcc $xcc, fregysor freg rq A1 
based on 64-bit integer condition codes 

FMOVQxcc 000011 Move Floating-Point Quad, fmovqxcc $xcc, fregrs2r fre qq C3 


based on 64-bit integer condition codes 





FMOVSfcc 00 0001 Move Floating-Point Single, fmovsfcc %fccn, fregrsor frega Al 
based on floating-point condition codes 

FMOVDfcc 000010 Move Floating-Point Double, fmovdfcc %fccn, fregrso, fregrq A 
based on floating-point condition codes 

FMOVQfcc 000011 Move Floating-Point Quad, fmovqfcc %fccn, fregrso, fregra C3 


based on floating-point condition codes 








WI 1 oo [Len [ne] uw | 7. 
0 29 q 9 18 q 0 q 0 
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FMOVcc 


Encoding of the cond Field for F.P. Moves Based on Integer Condition Codes (icc or xcc) 





cond 
1000 
0000 
1001 
0001 
1010 
0010 
1011 
0011 
1100 
0100 
1101 
0101 
1110 
0110 
1111 
0111 


Operation 

Move Always 

Move Never 

Move if Not Equal 

Move if Equal 

Move if Greater 

Move if Less or Equal 

Move if Greater or Equal 

Move if Less 

Move if Greater Unsigned 

Move if Less or Equal Unsigned 

Move if Carry Clear (Greater or Equal, Unsigned) 
Move if Carry Set (Less than, Unsigned) 
Move if Positive 

Move if Negative 


Move if Overflow Clear 





Move if Overflow Set 


icc / xcc Test 


not Z 


not (Z or (N xor V)) 


Z or (N xor V) 
not (N xor V) 
N xor V 

not (C or Z) 
(C or Z) 


not C 


icc/xcc name(s) in 
Assembly Language 


Mnemonics 
a 
n 
ne (or nz) 
e (or z) 
g 
Le 


leu 
cc (or geu) 
cs (or 1u) 
pos 


neg 
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FMOVcc 


Encoding of the cond Field for F.P. Moves Based on Floating-Point Condition Codes (fccn) 





fcc name(s) in Assembly 











cond Operation fccn Test Language Mnemonics 
1000 Move Always 1 a 
0000 Move Never 0 n 
0111 Move if Unordered U u 
0110 Move if Greater G g 
0101 Move if Unordered or Greater GorU ug 
0100 Move if Less L T 
0011 Move if Unordered or Less LorU ul 
0010 Move if Less or Greater LorG lg 
0001 Move if Not Equal LorGorU ne (or nz) 
1001 Move if Equal E e (or z 
1010 Move if Unordered or Equal E or U ue 
1011 Move if Greater or Equal EorG ge 
1100 Move if Unordered or Greater or Equal E or Gor U uge 
1101 Move if Less or Equal EorL le 
1110 Move if Unordered or Less or Equal E or L or U ule 
1111 Move if Ordered EorLorG o 





Encoding of opf cc Field (also see TABLE E-10 on page 484) 
Condition Code 
opf cc Instruction to be Tested 
1000  FMOV«sldlq»icc icc 
1100 | FMOV«sldlq»xcc xcc 
000, | FMOV«sldlq»fcc fecd 





0015 fcc1 
010; fcc2 
011, fcc3 


101; (illegal instruction exception) 
1115 
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Description 


FMOVcc 


The FMOVcc instructions copy the floating-point register(s) specified by rs2 to the 
floating-point register(s) specified by rd if the condition indicated by the cond field is 
satisfied by the selected floating-point condition code field in FSR. The condition 
code used is specified by the opf_cc field of the instruction. If the condition is 
FALSE, then the destination register(s) are not changed. 





These instructions read, but do not modify, any condition codes. 


These instructions clear (set to 0) both FSR.cexc and FSR ftt. They do not round, do 
not modify FSR.aexc, and do not treat floating-point NaN values differently from 
other floating-point values. 


Note | UltraSPARC Architecture 2005 processors do not implement in 
hardware instructions that refer to quad-precision floating-point 
registers. An attempt to execute an FMOVQicc, FMOVOxcc, or 
FMOVOfcc instruction causes an illegal instruction exception, 
allowing privileged software to emulate the instruction. 


An attempt to execute an FMOVcc instruction when instruction bit 18 is nonzero or 
opf cc = 101» or 111, causes an illegal instruction exception. 


If the FPU is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if no FPU is present, an 
attempt to execute an FMOVQicc, FMOVOxcc, or FMOVOfcc instruction causes an 
fp disabled exception. 


If the FPU is enabled, an attempt to execute an FMOVQicc, FMOVOxXxcc, or 
FMOVOfcc instruction causes an fp exception other (with FSR.ftt = 
unimplemented FPop), since that instruction is not implemented in hardware in 
UltraSPARC Architecture 2005 implementations. 


CHAPTER 7 * Instructions 183 


FMOVcc 


Programming | Branches cause the performance of most implementations to 

Note | degrade significantly. Frequently, the MOVcc and FMOVcc 
instructions can be used to avoid branches. For example, the 
following C language segment: 


double A, B, X; 
if (A > B) then X = 1.03; else X = 0.0; 


can be coded as 


! assume A is in $f0; B is in $f2; $xx points to 
! constant area 
ldd [Sxx+C_1.03],%£4 ! X 
fcmpd Sfcc3,%f0,%f2 LA > B 
fble,a %fcc3,label 
! following instructiononly executed if the 
! preceding branch was taken 
fsubd S£4,Sf£4, £4 ! X = 0.0 
label:... 


This code takes four instructions including a branch. 
With FMOVcc, this could be coded as 


ldd [$xx+C_1.03],%f4 1X = 1.03 
fsubd  $f4,$f4,$f6 !XxX' = 0.0 
fcmpd Sfcc3,%f0,%f2 !A>B 
fmovdle $fcc3,$f6,$f4 ! X = 0.0 


This code also takes four instructions but requires no branches 
and may boost performance significantly. Use MOVcc and 
FMOVcc instead of branches wherever these instructions would 
improve performance. 





Exceptions illegal instruction 
fo disabled 
fp exception other (FSR.ftt = unimplemented  FPop (opf cc = 101; or 1115)) 
fp exception other (FSR.ftt = unimplemented FPop (FMOVO instructions only)) 
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FMOVR 





7.29 Move Floating-Point Register on Integer 
Register Condition (FMOVR) 


Instruction rcond opflow Operation Test Class. 
— 000 0 0101 Reserved — — 
FMOVRsZ 001 00101 Move Single if Register = 0 Rfrsi]=0 A1 
FMOVRsLEZ 010 00101 Move Single if Register < 0 R[rs1] <0 A1 
FMOVRsLZ 011 0 0101 Move Single if Register < 0 R[rs1] «0 A1 
— 100 00101 Reserved — — 
FMOVRsNZ 101 0 0101 Move Single if Register + 0 R[rsí] ZO — A1 
FMOVRsGZ 110 00101 Move Single if Register » 0 Rfrsi]>0 A1 
FMOVRsGEZ 111 0 0101 Move Single if Register > 0 R[r1]20 A1 
= 000 0 0110 Reserved — — 
FMOVRdZ 001 00110 Move Double if Register = 0 R[rs1]20 A1 
FMOVRdLEZ 010 00110 Move Double if Register < 0 Rfrsi]<0 A1 
FMOVRdLZ 01 0 0110 Move Double if Register « 0 R[rsi] «0 A1 
— 100 0 0110 Reserved — — 
FMOVRdNZ 101 00110 Move Double if Register # 0 R[rsi] 20 A1 
FMOVRdGZ 110 00110 Move Double if Register » 0 Rfrsi]>0 A1 
FMOVRdGEZ 111 00110 Move Double if Register 2 0 R[rs] 20 A1 
/^—— — (Q0 O01  Remwd | —  —-. 














FMOVRqZ 001 00111 Move Quad if Register = 0 Rfrsi]=0 C3 
FMOVRqLEZ 010 00111 Move Quad if Register < 0 R[rs1] <0 C3 
FMOVRqLZ 01 0 0111 Move Quad if Register « 0 R[rs?] «0 C3 

— 100 00111 Reserved — — 
FMOVRqNZ 101 00111 Move Quad if Register # 0 R[rst] 20 C3 
FMOVRqGZ 110 0011 Move Quad if Register > 0 R[rst]>0 C3 
FMOVRqGEZ 111 00111 Move Quad if Register 2 0 R[rs1]20 C3 








3 Uwe p SD pom ww p ow 
9 5 4 0 


31 30 29 25 24 19 18 14 13 12 10 
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Description 


FMOVR 





Assembly Language Syntax 

fmovr{s,d,q}z  regyrsir freSrgar fre rg (synonym: fmovr{s, d, q}e) 
tmovr[s,d,q]lez regys1r fregrs2r freS rq 

tmovr(s,d,q]lz egy, fegrgo, fregrd 

tmovr[s,d,q]nz regrstr freSrs2r fregrd (synonym: £movrls,d, qne) 
tmovr[s,d,q]gz egy, freSrgar freS rq 

fmovr{s,d, q]gez reg;s1, fregrs2r fregrd 








If the contents of integer register R[rs1] satisfy the condition specified in the rcond 
field, these instructions copy the contents of the floating-point register(s) specified 
by the rs2 field to the floating-point register(s) specified by the rd field. If the 
contents of R[rs1] do not satisfy the condition, the floating-point register(s) specified 
by the rd field are not modified. 


These instructions treat the integer register contents as a signed integer value; they 
do not modify any condition codes. 


These instructions clear (set to 0) both FSR.cexc and FSR ftt. They do not round, do 
not modify FSR.aexc, and do not treat floating-point NaN values differently from 
other floating-point values. 


Note | UltraSPARC Architecture 2005 processors do not implement in 
hardware instructions that refer to quad-precision floating-point 
registers. An attempt to execute an FMOVRq instruction causes an 
illegal instruction exception, allowing privileged software to 
emulate the instruction. 


An attempt to execute an FMOVR instruction when instruction bit 13 is nonzero or 
rcond = 000, or 100, causes an illegal instruction exception. 


If the FPU is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if no FPU is present, an 
attempt to execute an FMOVR instruction causes an fp disabled exception. 


If the FPU is enabled, an attempt to execute an FMOVRa instruction causes an 
fp exception other (with FSR.ftt = unimplemented_FPop), since that instruction is 
not implemented in hardware in UltraSPARC Architecture 2005 implementations. 
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FMOVR 


Implementation | If this instruction is implemented by tagging each register value 
Note | with an N (negative) and a Z (zero) condition bit, use the 
following table to determine whether rcond is TRUE: 








Branch Test 
FMOVRNZ not Z 
FMOVRZ Z 


FMOVRGEZ not N 
FMOVRLZ N 
FMOVRLEZ NorZ 
FMOVRGZ N nor Z 


Exceptions fp disabled 
fp exception other (FSR.ftt = unimplemented_FPop (rcond = 0005 or 1005)) 
fp exception other (FSR.ftt = unimplemented FPop (FMOVRq)) 
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7.30 


Instruction 


FMUL8x16 
FMUL8x16AU 
FMUL8x16AL 
FMUL8SUx16 


FMULSULx16 








FMUL (partitioned) 


Partitioned Multiply Instructions 


opf 
0 0011 0001 


0 0011 0011 


0 0011 0101 


0 0011 0110 


0 0011 0111 


FMULD8SUx16 0 0011 1000 


FMULD8ULx16 0 0011 1001 


Operation si s2 d Assembly Language Syntax 


Unsigned 8-bit by signed 16-bit f32 f64 f64 fmul 


partitioned product 


Unsigned 8-bit by signed 16-bit £32 £32 f64 fmul 


upper a partitioned product 


Unsigned 8-bit by signed 16-bit £32 f32 f64 fmul 


lower & partitioned product 


Signed upper 8-bit by signed  f64 f64 f64 fmul 


16-bit partitioned product 


Unsigned lower 8-bit by signed f64 f64 f64 fmul 


16-bit partitioned product 


Signed upper 8-bit by signed  f32 f32 f64 fmul 


16-bit partitioned product 


Unsigned lower 8-bit by signed f32 f32 f64 fmul 


16-bit partitioned product 





8x16 fregistr freSrsar fregra 
8x16au fregrst, fregrs2r fregra 
8xl6al fregrsir fregrsar fregra 
8suxl6 fregrsir fregrsar fregra 
Bulx16 fregrgi, freSrs2r fregra 
d8sux16fregrs1r fregrs2r flSrd 


d8ulx16fregrsir freSrsar fregra 


Class 


C3 


C3 


C3 


C3 


C3 


C3 


C3 





110110 rst opf rs2 


31 30 29 


Description 


Exceptions 


25 24 


19 18 14 13 


5 4 


Programming | When software emulates an 8-bit unsigned by 16-bit signed 
Note | multiply, the unsigned value must be zero-extended and the 16-bit 
value sign-extended before the multiplication. 


The following sections describe the versions of partitioned multiplies. 


In an UltraSPARC Architecture 2005 implementation, these instructions are not 
implemented in hardware, cause an illegal instruction exception, and are emulated 
in software. 


illegal instruction 
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FMUL (partitioned) 
7.30.1 FMUL8x16 Instruction 


FMULS8x16 multiplies each unsigned 8-bit value (for example, a pixel component) in 
the 32-bit floating-point register Fg[rs1] by the corresponding (signed) 16-bit fixed- 
point integer in the 64-bit floating-point register Fp[rs2]. It rounds the 24-bit product 
(assuming binary point between bits 7 and 8) and stores the most significant 16 bits 
of the result into the corresponding 16-bit field in the 64-bit floating-point 
destination register Fp[rd]. FIGURE 7-10 illustrates the operation. 


Note | This instruction treats the pixel component values as fixed-point 
with the binary point to the left of the most significant bit. 
Typically, this operation is used with filter coefficients as the fixed- 
point rs2 value and image data as the rs1 pixel value. Appropriate 
scaling of the coefficient allows various fixed-point scaling to be 



































realized. 
F[rs1] | 
F[rs2] | | 
63 Y 16 15 Y 0 
XMS16b d in XMS16b 
F[rd] | | 
63 48 47 32 31 16 15 0 


FIGURE 7-10 FMUL8x16 Operation 
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FMUL (partitioned) 
7.30.2 FMUL8x16AU Instruction 


FMULS8x16AU is the same as FMUL8x16, except that one 16-bit fixed-point value is 
used as the multiplier for all four multiplies. This multiplier is the most significant 
("upper") 16 bits of the 32-bit register Fg[rs2] (typically an pixel component 
value). FIGURE 7-11 illustrates the operation. 


FsIrs1] 


Fs[rs2] 


ee ee ee n a! 


63 48 47 32 31 16 15 0 














FIGURE 7-11 FMUL8x16AU Operation 


7.30.3 FMULS8x16AL Instruction 


FMULS8x16AL is the same as FMUL8x16AU, except that the least significant 
("lower") 16 bits of the 32-bit register Fs[rs2] register are used as a multiplier. 
FIGURE 7-12 illustrates the operation. 





FsIrs1] 

















Fs[rs2] 





Fpírd] 
63 48 47 32 31 16 15 0 


FIGURE 7-12 FMUL8x16AL Operation 


190 UltraSPARC Architecture 2005 + Draft DO.9.2, 19 Jun 2008 


FMUL (partitioned) 
7.30.4 FMUL8SUx16 Instruction 


FMUL8SUx16 multiplies the most significant ("upper") 8 bits of each 16-bit signed 
value in the 64-bit floating-point register Fp[rs1] by the corresponding signed, 16-bit, 
fixed-point, signed integer in the 64-bit floating-point register Fp[rs2]. It rounds the 
24-bit product toward the nearest representable value and then stores the most 
significant 16 bits of the result into the corresponding 16-bit field of the 64-bit 
floating-point destination register Fp[rd]. If the product is exactly halfway between 
two integers, the result is rounded toward positive infinity. FIGURE 7-13 illustrates the 





operation. 
Fpírs1] I 
63 |5655 48 47 M0 39 32 31 Va 23 16 15 7 0 
Fpírs2] 





























Fplrd] 
63 48 47 32 31 16 15 0 


FIGURE 7-13 FMUL8SUx16 Operation 


7.30.5 FMUL8SULx16 Instruction 


FMULSULx16 multiplies the unsigned least significant ("lower") 8 bits of each 16-bit 
value in the 64-bit floating-point register Fp[rs1] by the corresponding fixed-point 
signed 16-bit integer in the 64-bit floating-point register Fp[rs2]. Each 24-bit product 
is sign-extended to 32 bits. The most significant (“upper”) 16 bits of the sign- 
extended value are rounded to nearest and then stored in the corresponding 16-bit 
field of the 64-bit floating-point destination register Fp[rd]. If the result is exactly 
halfway between two integers, the result is rounded toward positive infinity. 
FIGURE 7-14 illustrates the operation; CODE EXAMPLE 7-1 exemplifies the operation. 
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FMUL (partitioned) 








Fpirs1] ; 
s 48 47 40 39 | 32 31 24 23 
Fpirs2] 
x sign-extended, x sign-extended, x sign-extended, x sign-extended, 
MS16b MS16b MS16b MS16b 
Fpird] 





63 48 47 32 31 16 15 0 
FIGURE 7-14 FMUL8ULx16 Operation 


CODE EXAMPLE 7-1 16-bit x 16-bit 16-bit Multiply 


fmul8sux16 
fmul8ulx16 


fpadd16 





7.30.6 FMULD8SUx16 Instruction 


FMULD8SUx16 multiplies the most significant ("upper") 8 bits of each 16-bit signed 
value in F[rs1] by the corresponding signed 16-bit fixed-point value in F[rs2]. Each 
24-bit product is shifted left by 8 bits to generate a 32-bit result, which is then stored 
in the 64-bit floating-point register specified by rd. FIGURE 7-15 illustrates the 



































operation. 

Fsirs1] | | | j 
31 |24 23 16 15 87 0 

Fs[rs2] 
31 16 15 0 

x x 

Fplrd] 00000000 ee 00000000 
63 40 39 32 81 8 7 0 


FIGURE 7-15 FMULD8SUx16 Operation 
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FMUL (partitioned) 
7.30.7 FMULD8ULx16 Instruction 


FMULDSULx16 multiplies the unsigned least significant ("lower") 8 bits of each 16- 
bit value in F[rs1] by the corresponding 16-bit fixed-point signed integer in F[rs2]. 
Each 24-bit product is sign-extended to 32 bits and stored in the corresponding half 
of the 64-bit floating-point register specified by rd. FIGURE 7-16 illustrates the 
operation; CODE EXAMPLE 7-2 exemplifies the operation. 


Fslrs1] UE EE EPOR EMI 

















31 24 23 16 15 87 0 
Fs[rs2] | | | 
31 " 16 15 i 0 
x sign-extended x sign-extended 
Fplrdi =< Wa 
63 32 31 0 


FIGURE 7-16 FMULD8ULx16 Operation 


CODE EXAMPLE 7-2 16-bit x 16-bit 32-bit Multiply 


fmuld8sux16 
fmuld8ulx16 


fpadd32 
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FMUL<s|d|q> 





7.31 Floating-Point Multiply 


Instruction op3 opf Operation Assembly Language Syntax Class” 
FMULs 11 0100 001001001 Multiply Single fmuls  fregrsir freSrsor fegra Al 
FMULd 110100 001001010 Multiply Double fmuld  fregrsir freSrsor freg;g A1 
FMULq 11 0100 00100 1011 Multiply Quad fmulq  fregrsir freSrsor fregrd C3 
FsMULd 110100 001101001 Multiply Single to Double  fsmuld freg;sir freSrsor fregyq A1 
FdMUIq 11 0100 001101110 Multiply Double to Quad fdmulq fregrsir freSrsor fegra C3 





31 30 29 25 24 19 18 14 13 5 4 0 


Description The floating-point multiply instructions multiply the contents of the floating-point 
register(s) specified by the rs1 field by the contents of the floating-point register(s) 
specified by the rs2 field. The instructions then write the product into the floating- 
point register(s) specified by the rd field. 


The FsMULd instruction provides the exact double-precision product of two single- 
precision operands, without underflow, overflow, or rounding error. Similarly, 
FdMULQ provides the exact quad-precision product of two double-precision 
operands. 


Rounding is performed as specified by FSR.rd. 


Note | UltraSPARC Architecture 2005 processors do not implement in 
hardware instructions that refer to quad-precision floating-point 
registers. An attempt to execute an FMULq or FAMULgq instruction 
causes an illegal_instruction exception, allowing privileged 
software to emulate the instruction. 


If the FPU is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if no FPU is present, an 
attempt to execute any FMUL instruction causes an fp disabled exception. 


If the FPU is enabled, an attempt to execute an FMULq or FdMULQ instruction 
causes an fp. exception other (with FSR.ftt = unimplemented_FPop), since that 
instruction is not implemented in hardware in UltraSPARC Architecture 2005 
implementations. 


For more details regarding floating-point exceptions, see Chapter 8, IEEE Std 754- 
1985 Requirements for UltraSPARC Architecture 2005. 
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FMUL<s|d|q> 


Exceptions illegal instruction 
fo disabled 
fp exception other (FSR.ftt = unimplemented FPop (FMULq, FdMULq only)) 
fp exception other (FSR.ftt = unfinished FPop) 
fp exception ieee 754 (any: NV; FMUL«sldlq» only: OF, UF, NX) 
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FNEG 





7.02 


Floating-Point Negate 


Instruction  op3 opf Operation Assembly Language Syntax Class 
FNEGs 11 0100 0 0000 0101 Negate Single fnegs  fregrsor fregrg Al 
FNEGd 11 0100 0 0000 0110 Negate Double fnegd fregrsor freSrg Al 
FNEGq 11 0100 0 0000 0111 Negate Quad fnegq fregrsor fera C3 





31 30 29 


Description 


Exceptions 


25 24 19 18 14 13 5 4 0 


FNEG copies the source floating-point register(s) to the destination floating-point 
register(s), with the sign bit complemented. 


These instructions clear (set to 0) both FSR.cexc and FSR ftt. They do not round, do 
not modify FSR.aexc, and do not treat floating-point NaN values differently from 
other floating-point values. 


Note | UltraSPARC Architecture 2005 processors do not implement in 
hardware instructions that refer to quad-precision floating-point 
registers. An attempt to execute an FNEGq instruction causes an 
illegal instruction exception, allowing privileged software to 
emulate the instruction. 


An attempt to execute an FNEG instruction when instruction bits 18:14 are nonzero 
causes an illegal instruction exception. 


If the FPU is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if no FPU is present, an 
attempt to execute an FNEG instruction causes an fp disabled exception. 


illegal instruction 
fo disabled 
fp exception other (FSR.ftt = unimplemented FPop (FNEGq only)) 
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FPACK 





7:99 FPACK 


Instruction opf Operation si s2 d Assembly Language Syntax Class 

FPACK16 000111011 Four 16-bit packs into 8 — f64 f32 fpack16 fregrsor fregrg C3 
unsigned bits 

FPACK32 000111010 Two 32-bit packs into 8 f64 f64 f64 fpack32 fregrsz, fregrso, fregrd C3 
unsigned bits 


FPACKFIX 000111101 Four 16-bit packs into 16 —  f64 f32 fpackfix fregrsor fregrg C3 
signed bits 





31 30 29 25 24 19 18 14 13 5 4 0 


Description The FPACK instructions convert multiple values in a source register to a lower- 
precision fixed or pixel format and stores the resulting values in the destination 
register. Input values are clipped to the dynamic range of the output format. Packing 
applies a scale factor from GSR.scale to allow flexible positioning of the binary 
point. See the subsections on following pages for more detailed descriptions of the 
operations of these instructions. 


In an UltraSPARC Architecture 2005 implementation, these instructions are not 
implemented in hardware, cause an illegal instruction exception, and are emulated 
in software. 


An attempt to execute an FPACK16 or FPACKFIX instruction when rs1 + 0 causes an 
illegal instruction exception. 


Exceptions illegal instruction 


See Also FEXPAND on page 172 
FPMERGE on page 206 
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7,009.1 FPACK16 


FPACK16 takes four 16-bit fixed values from the 64-bit floating-point register 
Fp[rs2], scales, truncates, and clips them into four 8-bit unsigned integers, and stores 
the results in the 32-bit destination register, Fg[rd]. FIGURE 7-17 illustrates the 
FPACK16 operation. 


Fp[rs2] 











GSR.scale | «0100 














4 0 

Fp[rs2] (16 bits) | 
0000 

19 16 15 7 3 0 





FIGURE 7-17 FPACK16 Operation 


Note | FPACK16 ignores the most significant bit of GSR.scale 
(GSR.scale{4}). 


This operation is carried out as follows: 


1. Left-shift the value from Fp[rs2] by the number of bits specified in GSR.scale 
while maintaining clipping information. 


2. Truncate and clip to an 8-bit unsigned integer starting at the bit immediately to 
the left of the implicit binary point (that is, between bits 7 and 6 for each 16-bit 
word). Truncation converts the scaled value into a signed integer (that is, round 
toward negative infinity). If the resulting value is negative (that is, its most 
significant bit is set), 0 is returned as the clipped value. If the value is greater than 
255, then 255 is delivered as the clipped value. Otherwise, the scaled value is 
returned as the result. 


3. Store the result in the corresponding byte in the 32-bit destination register, Fg[rd]. 


For each 16-bit partition, the sequence of operations performed is shown in the 
following example pseudo-code: 


tmp & source_operand{15:0} << GSR.scale; 
// Pick off the bits from bit position 15+GSR.scale to 
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// bit position 7 from the shifted result 
trunc_signed_value + tmp{ (15+GSR.scale) :7}; 
If (trunc_signed_value < 0) 
unsigned 8bit result + 0; 
else if (trunc signed value » 255) 
unsigned 8bit result + 255; 
else 
unsigned 8bit result €— trunc signed value(14:7); 


7:322 FPACK32 


FPACK32 takes two 32-bit fixed values from the second source operand (64-bit 
floating-point register Fp[rs2]) and scales, truncates, and clips them into two 8-bit 
unsigned integers. The two 8-bit integers are merged at the corresponding least 
significant byte positions of each 32-bit word in the 64-bit floating-point register 
Fp[rs1], left-shifted by 8 bits. The 64-bit result is stored in Fp[rd]. Thus, successive 
FPACK32 instructions can assemble two pixels by using three or four pairs of 32-bit 
fixed values. FIGURE 7-18 illustrates the FPACK32 operation. 





Fpirs2] | 


SESE 


E 56 i À 47 40 j 32 E 24 L 16 k 


GSR.scale 


4 0 




















Fpirs2] (32 bits) 








000000 | 
37 31 30 0 












implicit binary point 


Fpird] 





FIGURE 7-18 FPACK32 Operation 


This operation, illustrated in FIGURE 7-18, is carried out as follows: 


1. Left-shift each 32-bit value in Fp[rs2] by the number of bits specified in 
GSR.scale, while maintaining clipping information. 
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2. For each 32-bit value, truncate and clip to an 8-bit unsigned integer starting at the 
bit immediately to the left of the implicit binary point (that is, between bits 23 and 
22 for each 32-bit word). Truncation is performed to convert the scaled value into 
a signed integer (that is, round toward negative infinity). If the resulting value is 
negative (that is, the most significant bit is 1), then 0 is returned as the clipped 
value. If the value is greater than 255, then 255 is delivered as the clipped value. 
Otherwise, the scaled value is returned as the result. 


3. Left-shift each 32-bit value from Fp[rs1] by 8 bits. 


4. Merge the two clipped 8-bit unsigned values into the corresponding least 
significant byte positions in the left-shifted Fp[rs2] value. 


5. Store the result in the 64-bit destination register Fp[rd]. 


For each 32-bit partition, the sequence of operations performed is shown in the 
following pseudo-code: 


tmp «— source_operand2{31:0} << GSR.scale; 
// Pick off the bits from bit position 31+GSR.scale to 
// bit position 23 from the shifted result 
trunc_signed_value< tmp{ (31+GSR.scale) : 23}; 
if (trunc_signed_value < 0) 
unsigned_8bit_value 0; 
else if (trunc_signed_value > 255) 
unsigned_8bit_value<¢ 255; 
else 
unsigned 8bit value — trunc signed value(30:23); 
Final, 32bit Result €— (source _operand1{31:0} << 8) | 
(unsigned 8bit value(7:0)); 
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7.9919 





FPACK 
FPACKFIX 


FPACKFIX takes two 32-bit fixed values from the 64-bit floating-point register 
Fp[rs2], scales, truncates, and clips them into two 16-bit unsigned integers, and then 
stores the result in the 32-bit destination register Fg[rd]. FIGURE 7-19 illustrates the 
FPACKFIX operation. 





























Fpirs2] 
63 32 31 0 
Fsird] 
31 16 15 0 
GSR.scale |00110 
4 0 
Fpirs2] (32 bits) 
000000 
37 32 31 16415 6 5 0 
implicit binary point 
Fsird] (16 bits) 











FIGURE 7-19 FPACKFIX Operation 


This operation is carried out as follows: 


1. Left-shift each 32-bit value from Fp[rs2]) by the number of bits specified in 
GSR.scale, while maintaining clipping information. 


2. For each 32-bit value, truncate and clip to a 16-bit unsigned integer starting at the 
bit immediately to the left of the implicit binary point (that is, between bits 16 and 
15 for each 32-bit word). Truncation is performed to convert the scaled value into 
a signed integer (that is, round toward negative infinity). If the resulting value is 
less than —32768, then -32768 is returned as the clipped value. If the value is 
greater than 32767, then 32767 is delivered as the clipped value. Otherwise, the 
scaled value is returned as the result. 


3. Store the result in the 32-bit destination register Fs[rd]. 


For each 32-bit partition, the sequence of operations performed is shown in the 
following pseudo-code: 

tmp < source operand(31:0) << GSR.scale; 

// Pick off the bits from bit position 31+GSR.scale to 

// bit position 16 from the shifted result 
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trunc signed value — tmp{ (31+GSR.scale) :16}; 

if (trunc signed value « -32768) 
signed 16bit result + -32768; 

else if (trunc signed value » 32767) 
signed 16bit result + 32767; 

else 
signed 16bit result + trunc signed value(31:16); 
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7.34 Fixed-point Partitioned Add[vis1 ] 


instruction opf Operation si s2 d Assembly Language Syntax Class 
FPADD16S 00101 0001 Two 16-bit adds f32 f32 f32 fpadd16s freg,sz, fregrs2, frega A1 
FPADD32 00101 0010 Two 32-bit adds f64 f64 f64  fpadd32  freg,, fregrs2, frega A1 
FPADD32S 00101 0011 One 32-bit add f32 132 132  fpadd32s fregrsı fregrs2, frega — AM 


31 30 29 25 24 19 18 14 18 5 4 0 


Description | FPADD16 (FPADD32) performs four 16-bit (two 32-bit) partitioned additions 
between the corresponding fixed-point values contained in the source operands 
(Fp[rs1], Fp[rs2]). The result is placed in the destination register, Fp[rd]. 


The 32-bit versions of these instructions (FPADD16S and FPADD32S) perform two 
16-bit or one 32-bit partitioned additions. 


Any carry out from each addition is discarded and a 2’s-complement arithmetic 
result is produced. 





63 48 47 32 31 16 15 0 


FIGURE 7-20 FPADD16 Operation 
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Fpirs1] 
Fpirs2] 
63 32 31 0 


FIGURE 7-21 FPADD32 Operation 


FsIrs1] 


Fsirs2] 





FsIrd] (sum) 


31 16 15 0 


FIGURE 7-22 FPADD16S Operation 


FsIrs1] 
31 0 
Fsirs2] 
31 0 
+ 
31 0 


FIGURE 7-23 FPADD32S Operation 
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If the FPU is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if no FPU is present, an 
attempt to execute an FPADD instruction causes an fp disabled exception. 


Exceptions fp disabled 
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7.35 FPMERGE 


Instruction opf Operation si s2 d Assembly Language Syntax Class 


FPMERGE 001001011 Two 32-bit merges f32 f32 f64 fpmerge fregrs1, freSrsor freSrg C3 





31 30 29 25 24 19 18 14 13 5 4 0 


Description FPMERGE interleaves eight 8-bit unsigned values in Fg[rs1] and Fg[rs2] to produce 
a 64-bit value in the destination register Fp[rd]. This instruction converts from 
packed to planar representation when it is applied twice in succession; for example, 
R1G1B1A1,R3G3B3A3 — R1R3G1G3A1A3 — RIR2R3R4G1G2G3G4. 


FPMERGE also converts from planar to packed when it is applied twice in 
succession; for example, RIR2R3R4,B1B2B3B4 — R1B1R2B2R3B3R4B4 — 
R1G1B1A1R2G2B2A2. 


FIGURE 7-24 illustrates the operation. 


Fpird] 








63 56 55 48 47 40 39 32 31 24 23 16 15 87 0 


FIGURE 7-24 FPMERGE Operation 








1 ] ] 2 : 
242 R3 G3 B3 A3 R4 G4 BA ag} packed representation 


fomerge %f0 lrl 3a ; 
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fpmerge %f4, 


Fomerge %f5, %f7, $d2 !B1 B2 B3 B4 Al A2 A3 a4} planar representation 





fomerge $f0, [rl ; ; 
F F3, %d6 !G1 Al G2 A2 G3 A3 c4 a4) intermediate 


fpmerge &f4, $f6, %d0 1 IRI Gl Bl Al RZ C2 BJ AQ 
2f5, %£7, &d2 !R3 G3 B3 A3 R4 G4 B4 a4} packed representation 




















Fpmerge $f5, 
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CODE EXAMPLE 7-3 FPMERGE 


In an UltraSPARC Architecture 2005 implementation, these instructions are not 
implemented in hardware, cause an illegal instruction exception, and are emulated 
in software. 


Exceptions illegal_instruction 


See Also FPACK on page 197 
FEXPAND on page 172 
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7.36 | Fixed-point Partitioned Subtract (64-bit) 


VIS 1 

















Instruction opf Operation si s2 d Assembly Language Syntax Class 
FPSUB16 0 0101 0100 Four 16-bit subtracts f64 f64 f64 fpsub16  fregrsy, fregrs2 fregra A1 
FPSUB16S 00101 0101 Two 16-bit subtracts f32 f32 f32 fpsublés fregrsy, freg;so, fregra A1 
FPSUB32 0 0101 0110 Two 32-bit subtracts f64 f64 f64  fpsub32  fregrsy, freg;so, fregra A1 
FPSUB32S 001010111 One 32-bit subtract . 32. f32 f82 fpsub32s fregrsy, freg;so, fregjg A1 








31 30 29 25 24 19 18 14 18 5 4 0 


Description  FPSUB16 (FPSUB32) performs four 16-bit (two 32-bit) partitioned subtractions 
between the corresponding fixed-point values contained in the source operands 
(Fp[rs], Fp[rs2]). The values in Fp[rs2] are subtracted from those in Fp[rs1], and 
the result is placed in the destination register, Fp[rd]. 


The 32-bit versions of these instructions (FPSUB165 and FPSUB32S) perform two 16- 
bit or one 32-bit partitioned subtractions. 


Any carry out from each subtraction is discarded and a 2's-complement arithmetic 
result is produced. 





Fplrd] 
(difference) 


63 48 47 32 31 16 15 0 


FIGURE 7-25 FPSUB16 Operation 
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Fpirs1] 


Fplrd] 





FsIrd] 
(difference) 


31 16 15 0 


FIGURE 7-27 FPSUB16S Operation 


Fsirs1] 

31 0 
Fsirs2] 

31 0 
FsIrd] 
(difference) 

31 0 


FIGURE 7-28 FPSUB32S Operation 
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If the FPU is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if no FPU is present, an 
attempt to execute an FPSUB instruction causes an fp disabled exception. 


Exceptions fp disabled 
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7.37 F Register Logical Operate (1 operand) 


Instruction opf Operation Assembly Language Syntax Class 
FZERO 001100000 Zerofill o ^^ fzo fega M. 
FZEROs 001100001 Zero fill, 32-bit fzeros  freg; A1 
FONE 001111110 One fill fone fregra Al 
FONEs 001111111 One fill, 32-bit fones ÎreSrd Al 





31 30 29 25 24 19 18 14 13 5 4 0 
Description FZERO and FONE fill the 64-bit destination register, Fp[rd], with all ‘0’ bits or all ‘1’ 
bits (respectively). 


FZEROs and FONES fill the 32-bit destination register, Fp[rd], with all ‘0’ bits or all 
‘T bits (respectively. 


An attempt to execute an FZERO or FONE instruction when instruction bits 18:14 or 
bits 4:0 are nonzero causes an illegal instruction exception. 


If the FPU is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if no FPU is present, an 
attempt to execute an FZERO[s] or FONE[s] instruction causes an fp disabled 
exception. 


Exceptions illegal instruction 
fo disabled 


See Also F Register 2-operand Logical Operations on page 212 
F Register 3-operand Logical Operations on page 214 


CHAPTER 7 * Instructions 211 


F Register 2-operand Logical Ops 





7.38 


Instruction opf Operation Assembly Language Syntax Class 
FSRC1 001110100 Copy Fp[rs1] to Fp[rd] fsrcl fregrs1, fregra A1 
FSRC1s 001110101 Copy Fgl[rs1] to Fg[rd], 32-bit fsrcls  fregrsir freSrg A1 
FSRC2 001111000 Copy Fp[rs2] to Fp[rd] fsrc2 fre21s2, freSra A1 
FSRC2s 001111001 Copy Fsfrs2] to Fg[rd], 32-bit fsrc2s  fregrsor fregrd A1 
FNOT1 001101010 Negate (1’s complement) Fp[rs1] fnotl fregrs1, frega A1 
FNOTIs 001101011  Negate (1’s complement) Fg[rs1], 32-bit | £notis  freg;sir fregra A1 
FNOT2 001100110 Negate (1’s complement) Fp[rs2] fnot2 fregrs2, fregra A1 
FNOT2s 001100111  Negate (1’s complement) Fg[rs2], 32-bit | £not2s freg,go, fregra A1 
10 rd 110110 rst opf owe 
10 rd 110110 p uem ii opf rs2 
31 30 29 25 24 19 18 14 13 5 4 0 
Description The standard 64-bit versions of these instructions perform one of four 64-bit logical 
operations on the 64-bit floating-point register Fp[rs1] (or Fp[rs2]) and store the 
result in the 64-bit floating-point destination register Fp[rd]. 
The 32-bit (single-precision) versions of these instructions perform 32-bit logical 
operations on Fg[rs1] (or Fs[rs2]) and store the result in Fg[rd]. 
An attempt to execute an FSRC1(s) or FNOT1(s) instruction when instruction bits 4:0 
are nonzero causes an i/legal_instruction exception. An attempt to execute an 
FSRC2(s) or FNOT2(s) instruction when instruction bits 18:14 are nonzero causes an 
illegal instruction exception. 
If the FPU is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if no FPU is present, an 
attempt to execute an FSRC1[s], FNOT1[s], FSRC1[s], or FNOT1[s] instruction causes 
an fp disabled exception. 
Programming | FSRC1s (FSRC1) functions similarly to FMOVs (FMOVd), except 
Note | that FSRC1s (FSRC1) does not modify the FSR register while 
FMOVs (FMOVd) update some fields of FSR (see Floating-Point 
Move on page 178). Programmers are encouraged to use FMOVs 
(FMOV4d) instead of FSRC1s (FSRC1) whenever practical. 
Exceptions illegal instruction 


F Register Logical Operate (2 operand) 


fo disabled 
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See Also Floating-Point Move on page 178 
F Register 1-operand Logical Operations on page 211 
F Register 3-operand Logical Operations on page 214 
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7.99 





F Register Logical Operate (3operand) 





Instruction opf Operation Assembly Language Syntax Class 
FOR 001111100 Logical or for freSrs1r fregrsor fregrd A1 
FORs 001111101 Logical or, 32-bit fors freSrs1r fregrsor fregrd A1 
FNOR 001100010 Logical nor fnor ÎreSrs1r freSrsor fregrd Al 
FNORs 00110 0011 Logical nor, 32-bit fnors ÎreSrs1r freSrsor fregrd Al 
FAND 0 0111 0000 Logical and and fregrsir freSrsor fregrd A1 
FANDs 0 0111 0001 Logical and, 32-bit fands fregrsir freSrsor fregrd A1 
FNAND 00110 1110 Logical nand fnand freSrs1r fregrsor fregrd A1 
FNANDs 001101111 Logical nand, 32-bit fnands freSrs1r fregrs2r fregrd A1 
FXOR 001101100 Logical xor fxor fregrsir freSrsor fregrd A1 
FXORs 001101101 Logical xor, 32-bit fxors fregrs1, fregrs2, fregra A1 
FXNOR 001110010 Logical xnor fxnor freSrstr f'egrs2, fregrd A1 
FXNORs 001110011 Logical xnor, 32-bit fxnors freSistr f'egrso, freSrag Al 
FORNOTI1 001111010 (not F[rs1]) or F[rs2] fornotl freg1s1, f'egrso, freSrg A1 
FORNOTIs 001111011 (not F[rs1]) or F[rs2], 32-bit fornotls fregrsir fleSrsor fregrd A1 
FORNOT2 001110110  F[rs1] or (not F[rs2]) fornot2 freSistr f'egrso, freSra Al 
FORNOT2s 001110111  F[rs1] or (not F[rs2]), 32-bit fornot2s  freSrg4, fleSrsor fregrd Al 
FANDNOT1 001101000 (not F[rs1]) and F[rs2] fandnotl fregrsir freSrsa, fregyq Al 
FANDNOTIs 001101001 (not F[rs1]) and F[rs2], 32-bit fandnotis fregrsir freSrsa, fregyq Al 
FANDNOT2 001100100 F[rs1] and (not F[rs2]) fandnot2  fregssir freSrsar fregyq Al 
FANDNOT2s 001100101 F[rs1] and (not F[rs2]), 32-bit fandnot2s fregrsir freSrsor fregyq Al 
10 rd rs opf rs2 
31 30 29 25 24 19 18 14 13 5 4 0 
Description The standard 64-bit versions of these instructions perform one of ten 64-bit logical 
operations between the 64-bit floating-point registers Fp[rs1] and Fp[rs2]. The result 
is stored in the 64-bit floating-point destination register Fp[rd]. 
The 32-bit (single-precision) versions of these instructions perform 32-bit logical 
operations between Fg[rs1] and Fsfrs2], storing the result in Fs[rd]. 
If the FPU is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if no FPU is present, an 
attempt to execute any 3-operand F Register Logical Operate instruction causes an 
fp disabled exception. 
Exceptions fp disabled 
See Also F Register 1-operand Logical Operations on page 211 


F Register 2-operand Logical Operations on page 212 
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7.40 


Instruction 
FSQORTs 
FSQRTd 
FSQRTq 


Floating-Point Square Root 


op3 opf Operation Assembly Language Syntax Class 
11 0100 0 0010 1001 Square Root Single fsqrts fregrgo, fregrd A1 
11 0100 0 0010 1010 Square Root Double fsqrtd  freg;go, fregrd A1 
11 0100 0 0010 1011 Square Root Quad fsqrtq fregrs2r fera C3 





10 
31 30 29 


Description 


Exceptions 


25 24 19 18 14 13 


These SPARC V9 instructions generate the square root of the floating-point operand 
in the floating-point register(s) specified by the rs2 field and place the result in the 

destination floating-point register(s) specified by the rd field. Rounding is performed 
as specified by FSR.rd. 


Note | UltraSPARC Architecture 2005 processors do not implement in 
hardware instructions that refer to quad-precision floating-point 
registers. An attempt to execute an FSORTq instruction causes an 
illegal instruction exception, allowing privileged software to 
emulate the instruction. 


An attempt to execute an FSQRT instruction when instruction bits 18:14 are nonzero 
causes an /llegal instruction exception. 


If the FPU is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if no FPU is present, an 
attempt to execute an FSORT instruction causes an fp. disabled exception. 


If the FPU is enabled, an fp exception other (with FSR.ftt = unimplemented  FPop) 
exception occurs, since the FSORT instructions are not implemented in hardware in 
UltraSPARC Architecture 2005 implementations. 


For more details regarding floating-point exceptions, see Chapter 8, IEEE Std 754- 
1985 Requirements for UltraSPARC Architecture 2005. 


illegal instruction 

fp disabled 

fp exception other (FSR.ftt = unimplemented_FPop (FSORT is not implemented 
in hardware)) 
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7.41 


Convert Floating-Point to Integer 


Instruction opf Operation s1 s2 d Assembly Language Syntax Class 
FsTOx 010000001 Convert Single to 64-bit Integer — £32 f64 fstox fregrso, frega Al 
FdTOx 010000010 Convert Double to 64-bit Integer — f64 f64 fdtox fregrso, frega A 
FqTOx 010000011 Convert Quad to 64-bit Integer — f128 f64 fatox fregrsor fregrg C3 
FsTOi 011010001 Convert Single to 32-bit Integer — f32 3182 fstoi fregrsor fregrg Al 
FdTOi 011010010 Convert Double to 32-bit Integer — f64 132 fdtoi fregrso, frega Al 
FqTOi 011010011 Convert Quad to 32-bit Integer — f128 f82 fqtoi fregrsor fregra C3 











Description 


OL LL 7 —] 
0 29 à 9 18 à à 0 


FsTOx, FdTOx, and FqTOx convert the floating-point operand in the floating-point 
register(s) specified by rs2 to a 64-bit integer in the floating-point register Fp[rd]. 


FsTOi, FdTOi, and FqTOi convert the floating-point operand in the floating-point 
register(s) specified by rs2 to a 32-bit integer in the floating-point register Fg[rd]. 


The result is always rounded toward zero; that is, the rounding direction (rd) field of 
the FSR register is ignored. 


Note | UltraSPARC Architecture 2005 processors do not implement in 
hardware instructions that refer to quad-precision floating-point 
registers. An attempt to execute a FqTOx or FqTOi instruction 
causes an illegal_instruction exception, allowing privileged 
software to emulate the instruction. 


An attempt to execute an F<s |d | q»TO«i | x> instruction when instruction bits 18:14 
are nonzero causes an /llegal instruction exception. 


If the FPU is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if no FPU is present, an 
attempt to execute an F«sldlq»TO«ilx» instruction causes an fp disabled 
exception. 


If the FPU is enabled, FqTOi and FqTOx cause fp exception other (with FSR.ftt = 
unimplemented, FPop), since those instructions are not implemented in hardware in 
UltraSPARC Architecture 2005 implementations. 


If the floating-point operand's value is too large to be converted to an integer of the 
specified size or is a NaN or infinity, then an fp exception ieee 754 "invalid" 
exception occurs. The value written into the floating-point register(s) specified by rd 
in these cases is as defined in Integer Overflow Definition on page 367. 
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Exceptions 


F«s|d|q» TOi 


For more details regarding floating-point exceptions, see Chapter 8, IEEE Std 754- 
1985 Requirements for UltraSPARC Architecture 2005. 


illegal instruction 
fo disabled 


fp exception other (FSR.ftt = unimplemented FPop (FqTOx, FqTOi only)) 
fp exception ieee 754 (NV, NX) 
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F<s|d|q>TO<s|d|q> 





7.42 Convert Between Floating-Point Formats 


Instruction op3 opt Operation si s2 d Assembly Language Syntax Class 
FsTOd 110100 011001001 Convert Single to Double —  f32 f64 fstod  freg;go, fregrq — AM 
FsTOq 110100 011001101 Convert Single to Quad — f32 f128 fstoq  freg;go, frega C3 
FdTOs 110100 011000110 Convert Double to Single —  fó4 (32  fdtos  freg;go, fregrq A1 
FdTOq 110100 011001110 Convert Double to Quad —  f64 128 fdtoq  freg;so, frega | C3 
FqTOs 110100 011000111 Convert Quad to Single  —  f128 f32  fqtos  freg;go, frega | C3 
FqTOd 110100 011001011 Convert Quad to Double —  f128 f64  fqtod  freg;so, fregra | C3 








spi - z z 
31 30 29 25 24 19 18 14 13 5 4 


Description These instructions convert the floating-point operand in the floating-point register(s) 
specified by rs2 to a floating-point number in the destination format. They write the 
result into the floating-point register(s) specified by rd. 


The value of FSR.rd determines how rounding is performed by these instructions. 


Note | UltraSPARC Architecture 2005 processors do not implement in 
hardware instructions that refer to quad-precision floating-point 
registers. An attempt to execute a FsTOq, FdTOq, FqTOs, or 
FqTOd instruction causes an illegal_instruction exception, allowing 
privileged software to emulate the instruction. 


An attempt to execute an F«sldlq»TO«sldlq» instruction when instruction bits 
18:14 are nonzero causes an illegal instruction exception. 


If the FPU is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if no FPU is present, an 
attempt to execute an F<s |d |q>TO<s |d | q> instruction causes an fp disabled 
exception. 


If the FPU is enabled, FsTOq, FdTOq, FqTOs, and FqTOd cause fp exception other 
(with FSR.ftt = unimplemented_FPop), since those instructions are not implemented 
in hardware in UltraSPARC Architecture 2005 implementations. 


FqTOd, FqTOs, and FdTOs (the “narrowing” conversion instructions) can cause 
fp exception ieee 754 OF, UF, and NX exceptions. FdTOq, FsTOq, and FsTOd (the 
“widening” conversion instructions) cannot. 


Any of these six instructions can trigger an fb. exception ieee 754 NV exception if 
the source operand is a signalling NaN. 
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Exceptions 


F<s|d|q>TO<s|d|q> 


Note | For FdTOs and FsTOd, an fp exception other with 
FSR.ftt = unfinished FPop can occur if implementation-dependent 
conditions are detected during the conversion operation. 


For more details regarding floating-point exceptions, see Chapter 8, IEEE Std 754- 
1985 Requirements for UltraSPARC Architecture 2005. 


illegal instruction 

fo disabled 

fp exception other (FSR.ftt = unimplemented_FPop (FsTOq, FqTOs, FdTOq, 
and FqTOd only)) 

fp exception other (FSR.ftt = unfinished FPop) 

fp exception ieee 754 (NV) 

fp exception ieee 754 (OF, UF, NX (FqTOd, FqTOs, and FdTOs)) 
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FSUB 





7.43 


Floating-Point Subtract 





Instruction op3 opf Operation Assembly Language Syntax Class 
FSUBs 11 0100 0 0100 0101 Subtract Single fsubs  fregrsir freSrsor fregrd A1 
FSUBd 11 0100 0 0100 0110 Subtract Double fsubd  fregysir freSrsor fregra A1 
FSUBq 11 0100 0 0100 0111 Subtract Quad fsubq  fregrsir freSrsor fregrd C3 
rd op3 rst opf rs2 
31 30 29 19 18 14 13 5 4 0 
Description The floating-point subtract instructions subtract the floating-point register(s) 
specified by the rs2 field from the floating-point register(s) specified by the rs1 field. 
The instructions then write the difference into the floating-point register(s) specified 
by the rd field. 
Rounding is performed as specified by FSR.rd. 

Note | UltraSPARC Architecture 2005 processors do not implement in 
hardware instructions that refer to quad-precision floating-point 
registers. An attempt to execute a FSUBq instruction causes an 
illegal instruction exception, allowing privileged software to 
emulate the instruction. 

If the FPU is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if no FPU is present, an 
attempt to execute an FSUB instruction causes an fp disabled exception. 

If the FPU is enabled, FSUBq causes an fp. exception other (with FSR.ftt = 
unimplemented, FPop), since that instruction is not implemented in hardware in 
UltraSPARC Architecture 2005 implementations. 

Note | An fp_exception_other with FSR.ftt = unfinished_FPop can occur 
if the operation detects unusual, implementation-specific 
conditions (for FSUBs or FSUBd). 

For more details regarding floating-point exceptions, see Chapter 8, IEEE Std 754- 
1985 Requirements for UltraSPARC Architecture 2005. 
Exceptions illegal instruction 


fo disabled 

fp exception other (FSR.ftt = unimplemented_FPop (FSUBq)) 
fp exception other (FSR.ftt 2 unfinished FPop) 

fp exception ieee 754 (OF, UF, NX, NV) 
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FxTO(<s|d|q> 





7.44 Convert 64-bit Integer to Floating Point 


Assembly Language 





Instruction op3 opf Operation si s2 d Syntax Class 

FxTOs 110100 010000100 Convert 64-bit Integer to — i64 f32 fxtos freg;so, fregrg A1 
Single 

FxTOd 110100 010001000 Convert 64-bit Integer to — i64 f64  fxtod freg;so, fregrg A1 
Double 

FxTOq 110100 010001100 Convert 64-bit Integer to — i64 f128 fxtoq fregrsor freg;g C3 
Quad 

10 rd op3 — opf rs2 
31 30 29 25 24 19 18 14 13 5 4 0 


Description FxTOs, FxTOd, and FxTOq convert the 64-bit signed integer operand in the floating- 
point register Fp[rs2] into a floating-point number in the destination format. 


All write their result into the floating-point register(s) specified by rd. 
The value of FSR.rd determines how rounding is performed by FxTOs and FxTOd. 


Note | UltraSPARC Architecture 2005 processors do not implement in 
hardware instructions that refer to quad-precision floating-point 
registers. An attempt to execute a FxTOq instruction causes an 
illegal instruction exception, allowing privileged software to 
emulate the instruction. 


An attempt to execute an FxTO«sldlq» instruction when instruction bits 18:14 are 
nonzero causes an illegal instruction exception. 


If the FPU is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if no FPU is present, an 
attempt to execute an FxTO<s |d |q> instruction causes an fp disabled exception. 


If the FPU is enabled, FXTOq causes an fp exception other (with FSR.ftt = 
unimplemented, FPop), since that instruction is not implemented in hardware in 
UltraSPARC Architecture 2005 implementations. 


For more details regarding floating-point exceptions, see Chapter 8, IEEE Std 754- 
1985 Requirements for Ultra$PARC Architecture 2005. 


Exceptions illegal instruction 
fo disabled 
fp exception other (FSR.ftt = unimplemented_FPop (FxTOq only)) 
fp exception ieee 754 (NX (FxTOs and FxTOd only)) 
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ILLTRAP 





7.45 Illegal Instruction Trap 


Instruction op op2 Operation Assembly Language Syntax Class 


ILLTRAP 00 000 illegal instruction trap illtrap  const22 A1 





oo NN 


31 30 29 25 24 22 21 0 


Description The ILLTRAP instruction causes an illegal_instruction exception. The const22 value 
in the instruction is ignored by the virtual processor; specifically, this field is not 
reserved by the architecture for any future use. 


V9 Compatibility | Except for its name, this instruction is identical to the SPARC V8 
Note | UNIMP instruction. 


An attempt to execute an ILLTRAP instruction when reserved instruction bits 29:25 
are nonzero (also) causes an illegal_instruction exception. However, software should 
not rely on this behavior, because a future version of the architecture may use 
nonzero values of bits 29:25 to encode other functions. 


Exceptions illegal instruction 
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IMPDEP 





7.46 


Implementation-Dependent Instructions 


Instruction op3 op4 Operation Class 
IMPDEP1 11 0110 (any) Implementation-Dependent Instruction 1 N3 
IMPDEP2A 11 0111 0 Implementation-Dependent Instruction 2A N3 
IMPDEP2B 11 0111 1,2,3 Implementation-Dependent Instruction 2B N3 





impl. dep. impl. dep. op4 impl. dep. 


31 30 29 


Description 


Exceptions 


7.46.1 


25 24 19 18 7 6 5 4 0 


IMPL. DEP. #106-V9: The IMPDEP2A opcode space is completely implementation 
dependent. Implementation-dependent aspects of IMPDEP2A instructions include 
their operation, the interpretation of bits 29-25, 18-7, and 4-0 in their encodings, 
and which (if any) exceptions they may cause. 


IMPDEP2B opcodes are reserved; see IMDEP2B Opcodes on page 224. 


See “Implementation-Dependent and Reserved Opcodes” in the "Extending the 
UltraSPARC Architecture" section of the separate document UltraSPARC Architecture 
Application Notes, for information about extending the instruction set by means of 
implementation-dependent instructions. 


Compatibility | IMPDEP2A and IMPDEP2B are subsets of the SPARC V9 
Note | IMPDEP2 opcode space. The IMPDEP1 opcode space from 
SPARC V9 is occupied by various VIS instructions in the 
UltraSPARC Architecture, so it should not be used for 
implementation-dependent instructions. 


implementation-dependent (IMPDEP2A, IMPDEP2B) 














IMPDEP1 Opcodes [VIS 1, 2 


All operands of instructions using IMPDEP1 opcodes are in floating-point registers, 
unless otherwise specified. Pixel values are stored in single-precision floating point 
registers and fixed values are stored in double-precision floating point registers, 
unless otherwise specified. 


Note | All IMPDEP1 instructions, regardless of whether they use 
floating-point registers or integer registers, leave FSR.cexc and 
FSR.aexc unchanged. 
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IMPDEP 
7.46.1.1 Opcode Formats 


Most of the VIS instruction set maps to the opcode space reserved for the 
Implementation-Dependent Instruction 1 (op3 = IMPDEP1 = 3644) instructions. 


7.46.2 | IMDEP2B Opcodes 


No instructions are currently encoded in the IMPDEP2B opcode space; it is a 
reserved opcode space. 
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INVALW 





7.47 Mark Register Window Sets as "Invalid" 





Instruction Operation Assembly Language Syntax Class 

INVALWP Mark all register window sets as “invalid” invalw A1 
e-oomr[ TOO | 
31 30 29 25 24 19 18 0 


Description The INVALW instruction marks all register window sets as “invalid”; specifically, it 
atomically performs the following operations: 


CANSAVE < (N REG WINDOWS — 2) 
CANRESTORE < 0 
OTHERWIN < 0 


INVALW marks all windows as invalid; after executing INVALW, 
N_REG_WINDOWS-2 SAVEs can be performed without generating a 
spill trap. 


Programming 
Notes 





In an UltraSPARC Architecture 2005 implementation, these instructions are not 
implemented in hardware, cause an illegal_instruction exception, and are emulated 
in software. 


Exceptions illegal_instruction (not implemented in hardware in UltraSPARC Architecture 2005) 


See Also ALLCLEAN on page 136 
NORMALW on page 274 
OTHERW on page 276 
RESTORED on page 294 
SAVED on page 302 
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JMPL 





7.48 Jump and Link 


Instruction op3 Operation Assembly Language Syntax Class 


JMPL 11 1000 Jump and Link jmpl address, re A1 
P S rd 





AIL 2 ———I- 


31 30 29 25 24 19 18 14 13 12 5 4 0 


Description The JMPL instruction causes a register-indirect delayed control transfer to the 
address given by "R[rs1] + R[rs2]" if i field = 0, or "R[rs1] + sign ext (simm13)" if 
i=1. 


The JMPL instruction copies the PC, which contains the address of the JMPL 
instruction, into register R[rd]. 


An attempt to execute a JMPL instruction when i = 0 and instruction bits 12:5 are 
nonzero causes an illegal instruction exception. 


If either of the low-order two bits of the jump address is nonzero, a 
mem_address_not_aligned exception occurs. 


Programming | A JMPL instruction with rd = 15 functions as a register-indirect 
Notes | call using the standard link register. 


JMPL with rd = 0 can be used to return from a subroutine. The 
typical return address is “r[31] + 8” if a nonleaf routine (one that 
uses the SAVE instruction) is entered by a CALL instruction, or 
“R[15] + 8” if a leaf routine (one that does not use the SAVE 
instruction) is entered by a CALL instruction or by a JMPL 
instruction with rd = 15. 





When PSTATE.am = 1, the more-significant 32 bits of the target instruction address 
are masked out (set to 0) before being sent to the memory system or being written 
into R[rd]. (closed impl. dep. #125-V9-Cs10) 


Exceptions illegal instruction 
mem address not aligned 


See Also CALL on page 150 
Bicc on page 142 
BPCC on page 148 
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LD 





7.49 


Load Integer 


Instruction  op3 Operation Assembly Language Syntax Class - 
LDSB 00 1001 Load Signed Byte ldsb address] , regra A1 
LDSH 00 1010 Load Signed Halfword ldsh address] , regra A1 
LDSW 00 1000 Load Signed Word ldsw address], regra A1 
LDUB 00 0001 Load Unsigned Byte ldub address], regra A1 
LDUH 00 0010 Load Unsigned Halfword lduh address], regra A1 
LDUW 00 0000 Load Unsigned Word lduwt address], regra A1 
LDX 00 1011 Load Extended Word ldx address], regra A1 








t synonym: 1a 


ope Bn qur. e 5$ JL ume 


11 
31 30 29 


Description 


E sd NE 


25 24 19 18 14 13 12 5 4 0 


The load integer instructions copy a byte, a halfword, a word, or an extended word 
from memory. All copy the fetched value into R[rd]. A fetched byte, halfword, or 
word is right-justified in the destination register R[rd]; it is either sign-extended or 
zero-filled on the left, depending on whether the opcode specifies a signed or 
unsigned operation, respectively. 


Load integer instructions access memory using the implicit ASI (see page 104). The 
effective address is “R[rs1] + R[rs2]" if i = 0, or "R[rs1] + sign ext (simm13)" if i = 1. 


A successful load (notably, load extended) instruction operates atomically. 


An attempt to execute a load integer instruction when i = 0 and instruction bits 12:5 
are nonzero causes an /llegal instruction exception. 


If the effective address is not halfword-aligned, an attempt to execute an LDUH or 
LDSH causes a mem adaress not aligned exception. If the effective address is not 
word-aligned, an attempt to execute an LDUW or LDSW instruction causes a 

mem address not aligned exception. If the effective address is not doubleword- 
aligned, an attempt to execute an LDX instruction causes a 

mem address not aligned exception. 


V8 Compatibility | The SPARC V8 LD instruction was renamed LDUW in the SPARC 
Note | V9 architecture. The LDSW instruction was new in the SPARC V9 
architecture. 


A load integer twin word (LDTW) instruction exists, but is deprecated; see Load 
Integer Twin Word on page 250 for details. 
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LD 


Exceptions illegal instruction 
mem address not aligned (all except LDSB, LDUB) 
VA watchpoint 
data access exception 
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LDA 





7.50 


Load Integer from Alternate Space 





Instruction op3 Operation Assembly Language Syntax Class 
LDSBAP^s 011001 Load Signed Byte from Alternate ldsba regaddr] imm asi, reg,q A1 
Space ldsba reg plus imm] $asi, reg 
LDSHAP^s 011010 Load Signed Halfword from Alternate 1dsha regaddr] imm asi, regrg A1 
Space ldsha reg plus imm] $asi, reg; 
LDSWAPAS 011000 Load Signed Word from Alternate ldswa regaddr] imm asi, reg, A1 
Space ldswa reg plus imm] $asi, reg; 
LDUBAP^s 010001 Load Unsigned Byte from Alternate  1duba regaddr] imm asi, reg,q A1 
Space lduba reg plus imm] $asi, reg; 
LDUHAPA 010010 Load Unsigned Halfword from lduha regaddr] imm asi, reg, A1 
Alternate Space lduha reg plus imm] $asi, reg; 
LDUWAP^s 010000 Load Unsigned Word from Alternate 1duwat  [regaddr] imm asi, reg, A1 
Space lduwa reg plus imm] $asi, reg; 
LDXAPas! 011011 Load Extended Word from Alternate 1dxa regaddr] imm asi, regrg A1 
Space ldxa reg plus imm] Sasi, reg; 











t synonym: lda 


Te E: 
FL IL a 


31 30 29 


Description 


25 24 19 18 14 13 12 

The load integer from alternate space instructions copy a byte, a halfword, a word, 
or an extended word from memory. All copy the fetched value into R[rd]. A fetched 
byte, halfword, or word is right-justified in the destination register R[rd]; it is either 
sign-extended or zero-filled on the left, depending on whether the opcode specifies a 
signed or unsigned operation, respectively. 


The load integer from alternate space instructions contain the address space 
identifier (ASI) to be used for the load in the imm_asi field if i = 0, or in the ASI 
register if i = 1. The access is privileged if bit 7 of the ASI is 0; otherwise, it is not 
privileged. The effective address for these instructions is “R[rs1] + R[rs2]” if i = 0, or 
“R[rs1] * sign ext (simm13)" if i = 1. 


A successful load (notably, load extended) instruction operates atomically. 


A load integer twin word from alternate space (LDTWA) instruction exists, but is 
deprecated; see Load Integer Twin Word from Alternate Space on page 252 for details. 


An attempt to execute a load integer from alternate space instruction when i = 0 and 
instruction bits 12:5 are nonzero causes an i/legal_instruction exception. 
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If the effective address is not halfword-aligned, an attempt to execute an LDUHA or 
LDSHA instruction causes a mem_address_not_aligned exception. If the effective 
address is not word-aligned, an attempt to execute an LDUWA or LDSWA 
instruction causes a mem_address_not_aligned exception. If the effective address is 
not doubleword-aligned, an attempt to execute an LDXA instruction causes a 

mem address not aligned exception. 


In nonprivileged mode (PSTATE.priv = 0), if bit 7 of the ASI is 0, these instructions 
cause a privileged action exception. In privileged mode (PSTATE.priv = 1), if the ASI 
is in the range 3046 to 7F16, these instructions cause a privileged action exception. 


LDSBA, LDSHA, LDSWA, LDUBA, LDUHA, and LDUWA can be used with any 
of the following ASIs, subject to the privilege mode rules described for the 
privileged action exception above. Use of any other ASI with these instructions 
causes a data access exception xception. 


ASIs valid for LDSBA, LDSHA, LDSWA, LDUBA, LDUHA, and LDUWA 

















ASI NUCLEUS ASI NUCLEUS LITTLE 

ASI AS IF USER PRIMARY ASI AS IF USER PRIMARY LITTLE 
ASI AS IF USER SECONDARY ASI AS IF USER SECONDARY LITTLE 
ASI REAL ASI REAL LITTLE 

ASI REAL IO ASI REAL IO LITTLE 

ASI PRIMARY ASI PRIMARY LITTLE 

ASI SECONDARY ASI SECONDARY LITTLE 

ASI PRIMARY NO FAULT ASI PRIMARY NO FAULT LITTLE 





ASI SECONDARY NO FAULT ASI SECONDARY NO FAULT LITTLE 


LDXA can be used with any ASI (including, but not limited to, the above list), unless 
it either (a) violates the privilege mode rules described for the privileged action 
exception above or (b) is used with any of the following ASIs, which causes a 

data access exception exception. 





ASIs invalid for LDXA (cause data access exception exception) 









































2416 (aliased to 27,5, ASI, TWINX N) 2C16 (aliased to 2F,5, ASI TWINX NL) 

2216 (ASI, TWINX AIUP) 2A16 (ASI, TWINX AIUP L) 

2316 (ASI, TWINX AIUS) 2B,g (ASI, TWINX AIUS L) 

26:64 (ASI, TWINX REAL) 2E,g (ASI, TWINX REAL L) 

2716 (ASI TWINX N) 2F16 (ASI, TWINX NL) 

ASI BLOCK AS IF USER PRIMARY ASI BLOCK AS IF USER PRIMARY LITTLE 
ASI BLOCK AS IF USER SECONDARY ASI BLOCK AS IF USER SECONDARY LITTLE 
ASI PST8, PRIMARY ASI PST8 PRIMARY LITTLE 

ASI PST8. SECONDARY ASI PST8. SECONDARY LITTLE 

ASI PST16 PRIMARY ASI PST16 PRIMARY LITTLE 

ASI PST16 SECONDARY ASI PST16 SECONDARY LITTLE 

ASI PST32 PRIMARY ASI PST32 PRIMARY LITTLE 

ASI PST32 SECONDARY ASI PST32 SECONDARY LITTLE 

ASI FL8 PRIMARY ASI FL8, PRIMARY LITTLE 
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Exceptions 


See Also 


LDA 





ASls invalid for LDXA (cause data_access_exception exception) 








ASI_FL8_SECONDARY AS] 
ASI FL16 PRIMARY AS] 
ASI FL16 SECONDARY AS] 
ASI BLOCK COMMIT, PRIMARY AS] 
E216 (ASI TWINX P) EA 
E316 (ASI TWINX, S) EB 





ASI BLOCK PRIMARY AS] 
ASI_BLOCK_SECONDARY AS] 





L8_SECONDARY_LITTLE 
L16_PRIMARY_LITTLE 
L16_SECONDARY_LITTLE 





ASI_TWINX_PL) 
ASI_TWINX_SL) 
LOCK_PRIMARY_LITTLE 











iF 
B 
zB 
BLOCK COMMIT SECONDARY 
e ( 
6 ( 
=B 
_B 





LOCK_SECONDARY_LITTLE 


mem_address_not_aligned (all except LDSBA and LDUBA) 


privileged_action 
VA_watchpoint 
data_access_exception 


LD on page 227 
STA on page 314 
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7.51 Block Load 


The LDBLOCKF instructions are deprecated and should not be used in new 
software. A sequence of LDX instructions should be used instead. 
The LDBLOCKF instruction is intended to be a processor-specific instruction, 


which may or may not be implemented in future UltraSPARC Architecture 
implementations. Therefore, it should only be used in platform-specific 


dynamically-linked libraries or in software created by a runtime code generator 
that is aware of the specific virtual processor implementation on which it is 





























executing. 
ASI 
Instruc-tion Value Operation Assembly Language Syntax Class 
rg eee 
LDBLOCKF™ 16%, 64-byte block load from primary address 1dda [regaddr] fASI BLK AIUP, fregrg D2 
space, user privilege ldda [reg plus imm] %asi, fregrg 
LDBLOCKFP 17416 64-byte block load from secondary ldda [regaddr] fASI BLK AIUS, fregrg D2 
address space, user privilege ldda [reg plus imm] ‘asi, frega 
LDBLOCKFP 1E36 64-byte block load from primary address ldda [regaddr] #ASI_BLK_AIUPL, freg D2 
space, little-endian, user privilege ldda [reg plus imm] ‘asi, frega 
LDBLOCKFP 1F36 64-byte block load from secondary ldda [regaddr] fASI BLK AIUSL, freg D2 
address space, little-endian, user privilegeldda [reg plus imm] ‘Sasi, fregrg 
LDBLOCKFP F0,¢ 64-byte block load from primary address 1dda [regaddr] 4AS1 BLK P, fregrg D2 
space ldda [reg plus imm] ‘asi, frega 
LDBLOCKFPP Fi; 64-byte block load from secondary ldda [regaddr] fASI BLK S, fregrg D2 
address space ldda [reg plus imm] %asi, frega 
LDBLOCKFP F8: 64-byte block load from primary address ldda [regaddr] $ASI BLK PL, fregyg D2 
space, little-endian ldda [reg plus imm] ‘asi, frega 
LDBLOCKFP F946 64-byte block load from secondary ldda [regaddr] fASI BLK SL, fregrd D2 
address space, little-endian ldda [reg plus imm] %asi, frega 
11 rd 11001 1 rs1 1=0 imm_asi rs2 
rd 110011 rs I=1 simm_13 
31 30 29 25 24 19 18 14 13 5 4 0 


Description A block load (LDBLOCKF) instruction uses one of several special block-transfer 
ASls. Block transfer ASIs allow block loads to be performed accessing the same 
address space as normal loads. Little-endian ASIs (those with an ‘L’ suffix) access 
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data in little-endian format; otherwise, the access is assumed to be big-endian. Byte 
swapping is performed separately for each of the eight 64-bit (double-precision) F 
registers used by the instruction. 


A block load instruction loads 64 bytes of data from a 64-byte aligned memory area 
into the eight double-precision floating-point registers specified by rd. The lowest- 
addressed eight bytes in memory are loaded into the lowest-numbered 64-bit 
(double-precision) destination F register. 


A block load only guarantees atomicity for each 64-bit (8-byte) portion of the 64 
bytes it accesses. 


The block load instruction is intended to support fast block-copy operations. 


Programming | LDBLOCKT is intended to be a processor-specific instruction 

Note | (see the warning at the top of page 232). If LDBLOCKF must be 
used in software intended to be portable across current and 
previous processor implementations, then it must be coded to 
work in the face of any implementation variation that is 
permitted by implementation dependency #410-S10, described 
below. 


IMPL. DEP. #410-S10: The following aspects of the behavior of block load 

(LDBLOCKF) instructions are implementation dependent: 

m What memory ordering model is used by LDBLOCKF (LDBLOCKF is not 
required to follow TSO memory ordering) 

m Whether LDBLOCKF follows memory ordering with respect to stores (including 
block stores), including whether the virtual processor detects read-after-write and 
write-after-read hazards to overlapping addresses 

m Whether LDBLOCKF appears to execute out of order, or follow LoadLoad 
ordering (with respect to older loads, younger loads, and other LDBLOCKFs) 

m Whether LDBLOCKF follows register-dependency interlocks, as do ordinary load 
instructions 

m Whether LDBLOCKFs to non-cacheable locations are (a) strictly ordered, (b) not 
strictly ordered and cause an illegal instruction exception, or (c) not strictly 
ordered and silently execute without causing an exception (option (c) is strongly 
discouraged) 

m Whether VA watchpoint exceptions are recognized on accesses to all 64 bytes of a 
LDBLOCKF (the recommended behavior), or only on the first eight bytes 

m Whether the MMU ignores the side-effect bit (TTE.e) for LDBLOCKF accesses 
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Exceptions 


LDBLOCKF 


Programming | If ordering with respect to earlier stores is important (for 

Note | example, a block load that overlaps a previous store) and read- 
after-write hazards are not detected, there must be a MEMBAR 
#StoreLoad instruction between earlier stores and a block 
load. 


If ordering with respect to later stores is important, there must 
be a MEMBAR #LoadStore instruction between a block load 
and subsequent stores. 


If LoadLoad ordering with respect to older or younger loads or 
other block load instructions is important and is not provided 
by an implementation, an intervening MEMBAR #LoadLoad is 
required. 





For further restrictions on the behavior of the block load instruction, see 
implementation-specific processor documentation. 


Implementation | In all UltraSPARC Architecture implementations, the MMU 
Note | ignores the side-effect bit (TTE.e) for LDBLOCKF accesses (impl. 
dep. #410-S10). 


Exceptions. An illegal instruction exception occurs if LDBLOCKF's floating-point 
destination registers are not aligned on an eight-double-precision register boundary. 


If the FPU is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if no FPU is present, an 
attempt to execute an LDBLOCKF instruction causes an fp disabled exception. 


If the least significant 6 bits of the effective memory address in an LDBLOCKF 
instruction are nonzero, a mem adaress not aligned exception occurs. 


In nonprivileged mode (PSTATE priv = 0), if bit 7 of the ASI is 0 (ASIs 1646, 1716, 
1E46, and 1F16), LDBLOCKF causes a privileged action exception. 


An access caused by LDBLOCKF may trigger a VA watchpoint exception (impl. dep. 
3410-510). 


Implementation | LDBLOCKF shares an opcode with LDDFA and LDSHORTE; it 
Note | is distinguished by the ASI used. 


illegal instruction 

fo disabled 

mem address not aligned 
privileged action 

VA watchpoint (impl. dep. #410-S10) 
data access exception 
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LDBLOCKF 


See Also STBLOCKF on page 317 
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LDF / LDDF / LDQF 





7.52 Load Floating-Point Register 


Instruction op3 rd Operation Assembly Language Syntax — Class - 
LDF 100000 0-31 Load Floating-Point Register la [address], freg M 
LDDF 10 0011 t Load Double Floating-Point Register ldd [address], freg rq A1 
LDQF 10 0010 t Load Quad Floating-Point Register ldq [address], fregrg C3 





t Encoded floating-point register value, as described on page 51. 


d MI 


31 30 29 25 24 19 18 14 13 12 5 4 0 


Description The load single floating-point instruction (LDF) copies a word from memory into 32- 
bit floating-point destination register Fg [rd]. 


The load doubleword floating-point instruction (LDDF) copies a word-aligned 
doubleword from memory into a 64-bit floating-point destination register, Fp [rd]. 
The unit of atomicity for LDDF is 4 bytes (one word). 


The load quad floating-point instruction (LDQF) copies a word-aligned quadword 
from memory into a 128-bit floating-point destination register, Fa [rd]. The unit of 
atomicity for LDOF is 4 bytes (one word). 


These load floating-point instructions access memory using the implicit ASI (see 
page 104). 


If i = 0, the effective address for these instructions is “R[rs1] + R[rs2]" and if i = 0, 
the effective address is "R[rs1] + sign ext (simm13)". 
Exceptions. An attempt to execute an LDF, LDDF, or LDQF instruction when i = 0 


and instruction bits 12:5 are nonzero causes an illegal instruction exception. 


If the FPU is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if no FPU is present, an 
attempt to execute an LDF, LDDF, or LDQF instruction causes an fp. disabled 
exception. 


If the effective address is not word-aligned, an attempt to execute an LDF instruction 
causes a mem address not aligned exception. 
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LDDF requires only word alignment. However, if the effective address is word- 
aligned but not doubleword-aligned, an attempt to execute an LDDF instruction 
causes an LDDF_mem_address_not_aligned exception. In this case, trap handler 
software must emulate the LDDF instruction and return (impl. dep. #109-V9- 
Cs10(a)). 


LDQF requires only word alignment. However, if the effective address is word- 
aligned but not quadword-aligned, an attempt to execute an LDOF instruction 
causes an LDQF_mem_address_not_aligned exception. In this case, trap handler 
software must emulate the LDQF instruction and return (impl. dep. #111-V9- 
Cs10(a)). 


Programming | Some compilers issued sequences of single-precision loads for 
Note | SPARC V8 processor targets when the compiler could not 
determine whether doubleword or quadword operands were 
properly aligned. For SPARC V9 processors, since emulation of 
misaligned loads is expected to be fast, compilers should issue 
sets of single-precision loads only when they can determine that 
doubleword or quadword operands are not properly aligned. 





An attempt to execute an LDQF instruction when rd{1} #0 causes an 
fp exception other (FSR. ftt = invalid fp register) exception. 


Implementation | Since UltraSPARC Architecture 2005 processors do not implement 
Note | in hardware instructions (including LDQF) that refer to quad- 
precision floating-point registers, the 
LDQF mem address not aligned and fp exception other (with 
FSR.ftt = invalid fp register) exceptions do not occur in 
hardware. However, their effects must be emulated by software 
when the instruction causes an illegal instruction exception and 
subsequent trap. 








Destination Register(s) when Exception Occurs. If a load floating-point 
instruction generates an exception that causes a precise trap, the destination floating- 
point register(s) remain unchanged. 


IMPL. DEP. #44-V8-Cs10(a)(1): If a load floating-point instruction generates an 
exception that causes a non-precise trap, the contents of the destination floating-point 
register(s) remain unchanged or are undefined. 


Exceptions ^ illegal instruction 
fo disabled 
LDDF mem address not aligned 
mem address not aligned 
fp exception other (FSR.ftt = invalid fp register (LDOF only)) 
VA watchpoint 
data access exception 
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See Also Load Floating-Point from Alternate Space on page 239 
Load Floating-Point State Register (Lower) on page 243 
Store Floating-Point on page 321 
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7.53 Load Floating-Point from Alternate 
Space 





Instruction op3 rd Operation Assembly Language Syntax Class 
LDFAP»s 110000 0-31 Load Floating-Point Register lda [regaddr] imm asi, fregrg A1 
from Alternate Space lda [reg plus imm] %asi, fregrg 
LDDFAPs 110011 À Load Double Floating-Point ldda  [regaddr] imm asi, fregrg A1 
Register from Alternate Space 1dda [reg plus imm] $asi, freg,g 
LDOFAPas 110010 + Load Quad Floating-Point ldqa  [regaddr] imm asi, fregrg C3 
Register from Alternate Space 1dqa [reg plus imm] $asi, fregrg 





t Encoded floating-point register value, as described in Floating-Point Register Number Encoding on page 51. 


ap? sr qep mess z 
ap? 


31 30 29 25 24 19 18 14 13 12 5 4 0 


Description The load single floating-point from alternate space instruction (LDFA) copies a word 
from memory into 32-bit floating-point destination register Fs [rd]. 


The load double floating-point from alternate space instruction (LDDFA) copies a 
word-aligned doubleword from memory into a 64-bit floating-point destination 
register, Fp [rd]. The unit of atomicity for LDDFA is 4 bytes (one word). 


The load quad floating-point from alternate space instruction (LDQFA) copies a 
word-aligned quadword from memory into a 128-bit floating-point destination 
register, Fo[rd]. The unit of atomicity for LDQFA is 4 bytes (one word). 


If i = 0, these instructions contain the address space identifier (ASI) to be used for the 
load in the imm_asi field and the effective address for the instruction is 

^R[rs1] + R[rs2]". If i = 1, the ASI to be used is contained in the ASI register and the 
effective address for the instruction is “R[rs1] + sign_ext (simm13)”. 


Exceptions. If the FPU is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if no 
FPU is present, an attempt to execute an LDFA, LDDFA, or LDQFA instruction 
causes an fp. disabled exception. 


LDFA causes a mem address not aligned exception if the effective memory address 
is not word-aligned. 


V9 Compatibility | LDFA, LDDFA, and LDQFA cause a privileged action exception if 
Note | PSTATE.priv = 0 and bit 7 of the ASI is 0. 
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LDDFA requires only word alignment. However, if the effective address is word- 
aligned but not doubleword-aligned, LDDFA causes an 

LDDF mem address not aligned exception. In this case, trap handler software 
must emulate the LDDFA instruction and return (impl. dep. #109-V9-Cs10(b)). 


LDQFA requires only word alignment. However, if the effective address is word- 
aligned but not quadword-aligned, LDOFA causes an 

LDQF mem adaress not aligned exception. In this case, trap handler software 
must emulate the LDOFA instruction and return (impl. dep. #111-V9-Cs10(b)). 


An attempt to execute an LDOFA instruction when rd{1} + 0 causes an 
fp exception other (with FSR.ftt = invalid. fp register) exception. 


Implementation | Since UltraSPARC Architecture 2005 processors do not implement 
Note | in hardware instructions (including LDQFA) that refer to quad- 
precision floating-point registers, the 
LDQF mem adaress not aligned and fp exception other (with 
FSR ftt = invalid fp register) exceptions do not occur in 
hardware. However, their effects must be emulated by software 
when the instruction causes an illegal instruction exception and 
subsequent trap. 


Programming | Some compilers issued sequences of single-precision loads for 
Note | SPARC V8 processor targets when the compiler could not 
determine whether doubleword or quadword operands were 
properly aligned. For SPARC V9 processors, since emulation of 
misaligned loads is expected to be fast, compilers should issue 
sets of single-precision loads only when they can determine that 
doubleword or quadword operands are not properly aligned. 





In nonprivileged mode (PSTATE priv = 0), if bit 7 of the ASI is 0, this instruction 
causes a privileged action exception. In privileged mode (PSTATE.priv = 1), if the 
ASI is in the range 3046 to 7F1¢, this instruction causes a privileged action exception. 


LDFA and LDQFA can be used with any of the following ASIs, subject to the 
privilege mode rules described for the privileged action exception above. Use of any 
other ASI with these instructions causes a data access exception exception. 


ASls valid for LDFA and LDQFA 














ASI NUCLEUS ASI NUCLEUS LITTLE 

ASI AS IF USER PRIMARY ASI AS IF USER PRIMARY LITTLE 
ASI AS IF USER SECONDARY ASI AS IF USER SECONDARY LITTLE 
ASI REAL ASI REAL LITTLE 

ASI REAL IO ASI REAL IO LITTLE 

ASI PRIMARY ASI PRIMARY LITTLE 

ASI SECONDARY ASI SECONDARY LITTLE 

ASI PRIMARY NO FAULT ASI PRIMARY NO FAULT LITTLE 








ASI SECONDARY NO FAULT ASI SECONDARY NO FAULT LITTLE 
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Exceptions 


LDFA / LDDFA / LDQFA 


LDDFA can be used with any of the following ASIs, subject to the privilege mode 
rules described for the privileged_action exception above. Use of any other ASI with 
the LDDFA instruction causes a data_access_exception exception. 





ASls valid for LDDFA 























ASI_NUCLEUS ASI_NUCLEUS_LITTLE 
ASI_AS_IF_USER_PRIMARY ASI_AS_IF_USER_PRIMARY_LITTLE 
ASI_AS_IF_USER_SECONDARY ASI_AS_IF_USER_SECONDARY_LITTLE 
ASI_REAL ASI REAL LITTLE 

ASI REAL IO ASI REAL IO LITTLE 

ASI PRIMARY ASI PRIMARY LITTLE 

ASI SECONDARY ASI SECONDARY LITTLE 

ASI PRIMARY NO. FAULT ASI PRIMARY NO FAULT LITTLE 

ASI SECONDARY NO. FAULT ASI SECONDARY NO FAULT, LITTLE 
Behavior with Partial Store ASIs. ASIs C016-C516 and C846-CD4g are only 
defined for use in Partial Store operations (see page 329). None of them should be 


used with LDDFA; however, if any of those ASIs is used with LDDFA, the LDDFA 
behaves as follows: 


1. IMPL. DEP. #257-U3: If an LDDFA opcode is used with an ASI of C0165-C546 or 
C816-CD16 (Partial Store ASIs, which are an illegal combination with LDDFA) and 
a memory address is specified with less than 8-byte alignment, the virtual 
processor generates an exception. It is implementation dependent whether the 
generated exception is a data access exception, mem address not aligned, or 
LDDF mem address not aligned exception. 


2. If the memory address is correctly aligned, the virtual processor generates a 
data access exception. 


Destination Register(s) when Exception Occurs. If a load floating-point 
alternate instruction generates an exception that causes a precise trap, the 
destination floating-point register(s) remain unchanged. 


IMPL. DEP. #44-V8-Cs10(b): If a load floating-point alternate instruction generates 
an exception that causes a non-precise trap, it is implementation dependent whether 
the contents of the destination floating-point register(s) are undefined or are 
guaranteed to remain unchanged. 


Implementation | LDDFA shares an opcode with the LDBLOCKF and LDSHORTF 
Note | instructions; it is distinguished by the ASI used. 


illegal_instruction 

fo disabled 

LDDF mem address not aligned 
mem address not aligned 
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fp exception other (FSR.ftt = invalid fp register (LDQFA only)) 
privileged action 
VA watchpoint 


See Also Load Floating-Point Register on page 236 
Block Load on page 232 
Store Short Floating-Point on page 332 
Store Floating-Point into Alternate Space on page 323 
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LDFSR (Deprecated) 





7.54 


Load Floating-Point State Register 
(Lower) 


The LDFSR instruction is deprecated and should not be used in new software. 
The LDXESR instruction should be used instead. 





Opcode op3 


rd Operation Assembly Language Syntax Class 





LDFSR 10 0001 0 Load Floating-Point State Register (Lower) 1d [address], $£sr D2 
100001 1-31 (see page 258) 


z SM — g 


31 30 29 


Description 


e 


25 24 19 18 14 13 12 5 4 


The Load Floating-point State Register (Lower) instruction (LDFSR) waits for all 
FPop instructions that have not finished execution to complete and then loads a 
word from memory into the less significant 32 bits of the FSR. The more-significant 
32 bits of FSR are unaffected by LDFSR. LDFSR does not alter the ver, ftt, qne, 
reserved, or unimplemented (for example, ns) fields of FSR (see page 58). 


Programming 
Note 


For future compatibility, software should only issue an LDFSR 
instruction with a zero value (or a value previously read from 
the same field) in any reserved field of FSR. 





LDFSR accesses memory using the implicit ASI (see page 108). 


An attempt to execute an LDFSR instruction when i = 0 and instruction bits 12:5 are 
nonzero causes an illegal_instruction exception. 


If the FPU is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if no FPU is present, an 
attempt to execute an LDFSR instruction causes an fp_disabled exception. 


LDFSR causes a mem_address_not_aligned exception if the effective memory 
address is not word-aligned. 


V8 Compatibility | The SPARC V9 architecture supports two different instructions 
Note | to load the FSR: the (deprecated) SPARC V8 LDFSR instruction 
is defined to load only the less-significant 32 bits of the FSR, 
whereas LDXFSR allows SPARC V9 programs to load all 64 bits 
of the FSR. 
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Implementation | LDFSR shares an opcode with the LDXFSR instruction (and 
Note | possibly with other implementation-dependent instructions); 
they are differentiated by the instruction rd field. An attempt to 
execute the op = 115, op3 = 10 0001; opcode with an invalid rd 
value causes an illegal instruction exception. 


Exceptions illegal instruction 
fo disabled 
mem address not aligned 
VA watchpoint 


See Also Load Floating-Point Register on page 236 
Load Floating-Point State Register on page 258 
Store Floating-Point on page 321 
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LDSHORTF 





7.55 


Short Floating-Point Load 























ASI 
Instruction Value Operation Assembly Language Syntax Class 
LDSHORTF D0j¢  8-bit load from primary address space  1dda regaddr] #ASI_FL8_P, freg;q C3 
ldda reg plus imm] $asi, fregrg 
LDSHORTF Dig  8-bit load from secondary address ldda regaddr] #ASI_FL8_S, freg;g C3 
space ldda reg plus imm] %asi, fregrg 
LDSHORTF D846  8-bit load from primary address space, 1dda regaddr] *ASI FL8. PL, fregrg C3 
little-endian ldda reg plus imm) $asi, fregrg 
LDSHORTF  D9;6  8-bit load from secondary address space, 1dda regaddr) #ASI_FL8_SL, fregrg C3 
little-endian ldda reg plus imm) $asi, fregrg 
LDSHORTF D2j¢ 16-bit load from primary address space 1dda regaddr] &ASI FL16 P, freg,g C3 
ldda reg plus imm] $asi, fregyg 
LDSHORTF D346 16-bit load from secondary address ldda regaddr] #ASI_FL16_S, fregrg C3 
space ldda reg plus imm] %asi, fregrg 
LDSHORTF  DaA,;g 16-bit load from primary address space, 1dda regaddr] *ASI FL16 PL, frega C3 
little-endian ldda reg plus imm] %asi, fregrg 
LDSHORTF DB, 16-bit load from secondary address ldda regaddr] #ASI_FL16_SL, frega C3 
space, little-endian ldda reg plus imm] %asi, fregrg 











Wa TT xb I € 


31 30 29 


Description 


25 24 19 18 14 18 5 4 0 


Short floating-point load instructions allow an 8- or 16-bit value to be loaded from 
memory into a 64-bit floating-point register. 


If the FPU is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if no FPU is present, an 
attempt to execute an LDSHORTF instruction causes an fp. disabled exception. 


An 8-bit load places the loaded value in the least significant byte of Fp[rd] and 
zeroes in the most-significant three bytes of Fp[rd]. An 8-bit LDSHORTF can be 
performed from an arbitrary byte address. 


A 16-bit load places the loaded value in the least significant halfword of Fp[rd] and 
zeroes in the more-significant halfword of Fp[rd]. A 16-bit LDSHORTF from an 
address that is not halfword-aligned (an odd address) causes a 

mem address not aligned exception. 
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Little-endian ASIs transfer data in little-endian format from memory; otherwise, 
memory is assumed to be in big-endian byte order. 


Programming 
Note 


LDSHORTF is typically used with the FALIGNDATA instruction 
(see Align Address on page 135) to assemble or store 64 bits from 
noncontiguous components. 





Implementation 
Note 


LDSHORTF shares an opcode with the LDBLOCKF and LDDFA 
instructions; it is distinguished by the ASI used. 





In an UltraSPARC Architecture 2005 implementation, these instructions are not 
implemented in hardware, cause a data access exception exception, and are 
emulated in software. 


Exceptions VA watchpoint 
data access exception 
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LDSTUB 





7.56 Load-Store Unsigned Byte 


Instruction op3 Operation Assembly Language Syntax Class 


LDSTUB 00 1101 Load-Store Unsigned Byte ldstub [address], reg, A1 





DE AE A A A E 


31 30 29 25 24 19 18 14 13 12 5 4 0 


Description The load-store unsigned byte instruction copies a byte from memory into R[rd], then 
rewrites the addressed byte in memory to all 1’s. The fetched byte is right-justified in 
the destination register R[rd] and zero-filled on the left. 


The operation is performed atomically, that is, without allowing intervening 
interrupts or deferred traps. In a multiprocessor system, two or more virtual 
processors executing LDSTUB, LDSTUBA, CASA, CASXA, SWAP, or SWAPA 
instructions addressing all or parts of the same doubleword simultaneously are 
guaranteed to execute them in an undefined, but serial, order. 


LDSTUB accesses memory using the implicit ASI (see page 104). The effective 
address for this instruction is “R[rs1] + R[rs2]” if i=0, or 
^R[rs1] + sign ext (simm13)" if i = 1. 


The coherence and atomicity of memory operations between virtual processors and 
I/O DMA memory accesses are implementation dependent (impl. dep. #120-V9). 


An attempt to execute an LDSTUB instruction when i = 0 and instruction bits 12:5 


are nonzero causes an illegal instruction exception. 


Exceptions illegal instruction 
VA watchpoint 
data access exception 
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7.57  Load-Store Unsigned Byte to Alternate 
Space 





Instruction op3 Operation Assembly Language Syntax Class 
LDSTUBA?4s! 011101  Load-Store Unsigned Byte into ldstuba [regaddr] imm asi, reg, A1 
Alternate Space ldstuba [reg plus imm] Sasi, regra 


RER mao I a 
WP [8 TL 5mw489— — — 


31 30 29 25 24 19 18 14 13 12 5 4 0 


Description The load-store unsigned byte into alternate space instruction copies a byte from 
memory into R[rd], then rewrites the addressed byte in memory to all 1’s. The 
fetched byte is right-justified in the destination register R[rd] and zero-filled on the 
left. 


The operation is performed atomically, that is, without allowing intervening 
interrupts or deferred traps. In a multiprocessor system, two or more virtual 
processors executing LDSTUB, LDSTUBA, CASA, CASXA, SWAP, or SWAPA 
instructions addressing all or parts of the same doubleword simultaneously are 
guaranteed to execute them in an undefined, but serial, order. 


If i= 0, LDSTUBA contains the address space identifier (ASI) to be used for the load 
in the imm asi field. If i= 1, the ASI is found in the ASI register. In nonprivileged 
mode (PSTATE.priv = 0), if bit 7 of the ASI is 0, this instruction causes a 
privileged action exception. In privileged mode (PSTATE.priv = 1), if the ASI is in the 
range 3046 to 7Fy¢, this instruction causes a privileged action exception. 


LDSTUBA can be used with any of the following ASIs, subject to the privilege mode 
rules described for the privileged action exception above. Use of any other ASI with 
this instruction causes a data access exception exception. 





ASIs valid for LDSTUBA 








ASI NUCLEUS ASI NUCLEUS LITTLE 

ASI AS IF USER PRIMARY ASI AS IF USER PRIMARY LITTLE 
ASI AS IF USER SECONDARY ASI AS IF USER SECONDARY LITTLE 
ASI REAL ASI REAL LITTLE 

ASI PRIMARY ASI PRIMARY LITTLE 

ASI SECONDARY ASI SECONDARY LITTLE 
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Exceptions privileged_action 
VA_watchpoint 
data_access_exception 
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LDTW (Deprecated) 





7.58 


Load Integer Twin Word 


The LDTW instruction is deprecated and should not be used in new software. It 


is provided only for compatibility with previous versions of the architecture.The 
LDX instruction should be used instead. 





Instruction op3 Operation Assembly Language Syntax + Class 





LDTWP 00 0011 Load Integer Twin Word ldtw [address], regyg D2 


t The original assembly language syntax for this instruction used an “1dd” instruction mnemonic, which is now 
deprecated. Over time, assemblers will support the new “1dtw” mnemonic for this instruction. In the mean- 
time, some existing assemblers may only recognize the original “1dd” mnemonic. 


prj uw cup. 29 poly m S ie 
Ew locum PR 


31 30 29 


Description 


25 24 19 18 14 13 12 


The load integer twin word instruction (LDTW) copies two words (with doubleword 
alignment) from memory into a pair of R registers. The word at the effective 
memory address is copied into the least significant 32 bits of the even-numbered R 
register. The word at the effective memory address - 4 is copied into the least 
significant 32 bits of the following odd-numbered R register. The most significant 32 
bits of both the even-numbered and odd-numbered R registers are zero-filled. 


Note | Execution of an LDTW instruction with rd = 0 modifies only 
R[1]. 


Load integer twin word instructions access memory using the implicit ASI (see 
page 104). If i = 0, the effective address for these instructions is “R[rs1] + R[rs2]” and 
if i = 0, the effective address is "R[rs1] + sign ext (simm13)". 


With respect to little endian memory, an LDTW instruction behaves as if it comprises 
two 32-bit loads, each of which is byte-swapped independently before being written 
into its respective destination register. 


IMPL. DEP. #107-V9a: It is implementation dependent whether LDTW is 
implemented in hardware. If not, an attempt to execute an LDTW instruction will 
cause an unimplemented_LDTW exception. 


Programming | LDTW is provided for compatibility with existing SPARC V8 
Note | software. It may execute slowly on SPARC V9 machines because 
of data path and register-access difficulties. 
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Exceptions 


See Also 


LDTW (Deprecated) 


SPARC V9 | LDTW was (inaccurately) named LDD in the SPARC V8 and 
Compatibility | SPARC V9 specifications. It does not load a doubleword; it 
Note | loads two words (into two registers), and has been renamed 
accordingly. 


The least significant bit of the rd field in an LDTW instruction is unused and should 
always be set to 0 by software. An attempt to execute an LDTW instruction that 
refers to a misaligned (odd-numbered) destination register causes an 

illegal instruction exception. 


An attempt to execute an LDTW instruction when i = 0 and instruction bits 12:5 are 
nonzero causes an illegal instruction exception. 


If the effective address is not doubleword-aligned, an attempt to execute an LDTW 
instruction causes a mem address not aligned exception. 


A successful LDTW instruction operates atomically. 


unimplemented LDTW 
illegal instruction 

mem address not aligned 
VA watchpoint 

data access exception 


LDW/LDX on page 227 
STTW on page 334 
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LDTWA (Deprecated) 





7.59 Load Integer Twin Word from Alternate 
Space 


The LDTWA instruction is deprecated and should not be used in new software. 
The LDXA instruction should be used instead. 


Opcode op3 Operation Assembly Language Syntax Class 








LDTWA "AS 01 0011 Load Integer Twin Word from Alternate ldtwa [regaddr] imm asi, reg, D2, Y3+ 
Space ldtwa [reg plus imm] %asi, rega 





t The original assembly language syntax for this instruction used an “1dda” instruction mnemonic, which is now deprecated. Over time, 
assemblers will support the new "1dtwa" mnemonic for this instruction. In the meantime, some assemblers may only recognize the 
original “1dda” mnemonic. 





X Y3 for restricted ASIs (00:6-7F16); D2 for unrestricted ASIs (8016-FF16) 


E SOM mw Ix 


31 30 29 25 24 19 18 14 13 12 5 4 0 


Description The load integer twin word from alternate space instruction (LDTWA) copies two 
32-bit words from memory (with doubleword memory alignment) into a pair of R 
registers. The word at the effective memory address is copied into the least 
significant 32 bits of the even-numbered R register. The word at the effective 
memory address + 4 is copied into the least significant 32 bits of the following odd- 
numbered R register. The most significant 32 bits of both the even-numbered and 
odd-numbered R registers are zero-filled. 


Note | Execution of an LDTWA instruction with rd = 0 modifies only 
R[1]. 


If i = 0, the LDTWA instruction contains the address space identifier (ASI) to be used 
for the load in its imm_asi field and the effective address for the instruction is 
“R[rs1] + R[rs2]". If i = 1, the ASI to be used is contained in the ASI register and the 
effective address for the instruction is “R[rs1] + sign_ext (simm13)”. 


With respect to little endian memory, an LDTWA instruction behaves as if it is 
composed of two 32-bit loads, each of which is byte-swapped independently before 
being written into its respective destination register. 
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LDTWA (Deprecated) 


IMPL. DEP. #107-V9b: It is implementation dependent whether LDTWA is 
implemented in hardware. If not, an attempt to execute an LDTWA instruction will 
cause an unimplemented_LDTW exception so that it can be emulated. 


Programming | LDTWA is provided for compatibility with existing SPARC V8 
Note | software. It may execute slowly on SPARC V9 machines because 
of data path and register-access difficulties. 


If LDTWA is emulated in software, an LDXA instruction 
instruction should be used for the memory access in the 
emulation code in order to preserve atomicity. 


SPARC V9 | LDTWA was (inaccurately) named LDDA in the SPARC V8 and 
Compatibility | SPARC V9 specifications. 
Note 


The least significant bit of the rd field in an LDTWA instruction is unused and 
should always be set to 0 by software. An attempt to execute an LDTWA instruction 
that refers to a misaligned (odd-numbered) destination register causes an 

illegal instruction exception. 


If the effective address is not doubleword-aligned, an attempt to execute an LDTWA 
instruction causes a mem address not aligned exception. 


A successful LDTWA instruction operates atomically. 


LDTWA causes a mem address not aligned exception if the address is not 
doubleword-aligned. 


In nonprivileged mode (PSTATE.priv = 0), if bit 7 of the ASI is 0, these instructions 
cause a privileged action exception. In privileged mode (PSTATE.priv = 1), if the ASI 
is in the range 3046 to 7F16, these instructions cause a privileged action exception. 


LDTWA can be used with any of the following ASIs, subject to the privilege mode 
rules described for the privileged action exception above. Use of any other ASI with 
this instruction causes a data access exception exception (impl. dep. #300-U4- 
Cs10). 


ASls valid for LDTWA 

















ASI_NUCLEUS ASI_NUCLEUS_LITTLE 
ASI_AS_IF_USER_PRIMARY ASI_AS_IF_USER_PRIMARY_LITTLE 
ASI_AS_IF_USER_SECONDARY ASI_AS_IF_USER_SECONDARY_LITTLE 
ASI_REAL ASI_REAL_LITTLE 

ASI_REAL_IO ASI_REAL_IO_LITTLE 

221g$ (ASI TWINX AIUP) 2A16t (ASI TWINX AIUP L) 

2316 (ASI TWINX AIUS) 2B:6t (ASI TWINX AIUS L) 

24:61 (aliased to 27,5, ASI, TWINX N) 2C,gt(aliased to 2F,6, ASI. TWINX NL) 
26:6t (ASI TWINX REAL) 2E16t (ASI TWINX REAL L) 

27,g$ (ASI TWINX N) 2F16t (ASI TWINX NL) 
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LDTWA (Deprecated) 


ASIs valid for LDTWA 





ASI_PRIMARY ASI_PRIMARY_LITTLE 

ASI SECONDARY ASI SECONDARY LITTLE 

ASI PRIMARY NO FAULT ASI PRIMARY NO FAULT LITTLE 
ASI SECONDARY NO FAULT ASI SECONDARY NO FAULT LITTLE 
E2:6t (ASI TWINX P) EA;6t (ASI TWINX PL) 

E3:6t (ASI TWINX S) EB;6t (ASI TWINX SL) 





i If this ASI is used with the opcode for LDTWA and i = 0, the LDTXA 
instruction is executed instead of LDTWA. For behavior of LDTXA, 
see Load Integer Twin Extended Word from Alternate Space on page 255. 
If this ASI is used with the opcode for LDTWA and i = 1, behavior is 
undefined. 


Programming | Nontranslating ASIs (see page 399) should only be accessed 

Note using LDXA (not LDTWA) instructions. If an LDTWA 
referencing a nontranslating ASI is executed, per the above 
table, it generates a data access exceptionexception (impl. dep. 
#300-U4-Cs10). 


Implementation | The deprecated instruction LDTWA shares an opcode with 

Note | LDTXA. LDTXA is not deprecated and has different address 
alignment requirements than LDTWA. See Load Integer Twin 
Extended Word from Alternate Space on page 255. 





Exceptions unimplemented LDTW illegal instruction 
mem address not aligned 
privileged action 
VA watchpoint 
data access exception 


See Also LDWA/LDXA on page 229 
LDTXA on page 255 
STTWA on page 336 
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7.60 Load Integer Twin Extended Word from 
Alternate Space [vis 2+] 





The LDTXA instructions are not guaranteed to be implemented on all 
UltraSPARC Architecture implementations. Therefore, they should only be 


used in platform-specific dynamically-linked libraries or in software created by a 
runtime code generator that is aware of the specific virtual processor 
implementation on which it is executing. 









































ASI 
Instruction Value Operation Assembly Language Syntax + Class 
LDTXAN 22 6 Load Integer Twin Extended Word, ldtxa [regaddr] $ASI TWINX AIUP, reg; N1 
as if user (nonprivileged), Primary 
address space 
2316 Load Integer Twin Extended Word, | 1dtxa [regaddr] &ASI TWINX AIUS, reg; N1 
as if user (nonprivileged), Secondary 
address space 
2614 Load Integer Twin Extended Word, | 1dtxa [regaddr] &ASI TWINX REAL, reg; N1 
real address 
2716 Load Integer Twin Extended Word, — 1dtxa [regaddr] $ASI TWINX N, regyg N1 
nucleus context 
2Ai¢ Load Integer Twin Extended Word, ldtxa [regaddr] $ASI TWINX AIUP. L, reg;g N1 
as if user (nonprivileged), Primary 
address space, little endian 
2Big Load Integer Twin Extended Word, ldtxa [regaddr] $ASI TWINX AIUS. L, reg;g N1 
as if user (nonprivileged), Secondary 
address space, little endian 
2E36 Load Integer Twin Extended Word, ldtxa [regaddr] &ASI TWINX REAL L, reg; N1 
real address, little endian 
2F,¢ Load Integer Twin Extended Word, ldtxa [regaddr] $ASI TWINX NL, reg; N1 
nucleus context, little-endian 
LDTXAN E2 6 Load Integer Twin Extended Word, — 1dtxa [regaddr] $ASI TWINX P, regyg N1 
Primary address space 
E346. Load Integer Twin Extended Word, | ldtxa [regaddr] &ASI TWINX S, reg;g N1 
Secondary address space 
EAi¢ Load Integer Twin Extended Word, | ldtxa [regaddr] $ASI TWINX PL, regrg N1 
Primary address space, little endian 
EB:6 Load Integer Twin Extended Word, — ldtxa [regaddr] $ASI TWINX SL, regrg N1 


Secondary address space, little-endian 





+ The original assembly language syntax for these instructions used the “1dda” instruction mnemonic. That syntax is now deprecated. 
Over time, assemblers will support the new "1dtxa" mnemonic for this instruction. In the meantime, some existing assemblers may 
only recognize the original "1dda" mnemonic. 
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31 30 29 


Description 


LDTXA 


rao E 
25 24 


19 18 14 18 12 5 4 0 


ASIs 2616, 2E 46, E216 E346, F016, and F14g are used with the LDTXA instruction to 
atomically read a 128-bit data item into a pair of 64-bit R registers (a "twin extended 
word"). The data are placed in an even/odd pair of 64-bit registers. The lowest- 
address 64 bits are placed in the even-numbered register; the highest-address 64 bits 
are placed in the odd-numbered register. 


Note | Execution of an LDTXA instruction with rd = 0 modifies only 
R[1]. 


ASIs E216, E316, F016, and F146 perform an access using a virtual address, while ASIs 
2616 and 2Eg use a real address. 


An LDTXA instruction that performs a little-endian access behaves as if it comprises 
two 64-bit loads (performed atomically), each of which is byte-swapped 
independently before being written into its respective destination register. 


Exceptions. An attempt to execute an LDTXA instruction with an odd-numbered 
destination register (rd{0} = 1) causes an illegal instruction exception. 


An attempt to execute an LDTXA instruction with an effective memory address that 
is not aligned on a 16-byte boundary causes a mem address not aligned exception. 


IMPL. DEP. #413-S10: It is implementation dependent whether 
VA watchpointexceptions are recognized on accesses to all 16 bytes of a LDTXA 
instruction (the recommended behavior) or only on accesses to the first 8 bytes. 


An attempted access by an LDTXA instruction to noncacheable memory causes an a 
data access exception exception (impl. dep. #306-U4-Cs10). 


Programming | A key use for this instruction is to read a full TTE entry (128 bits, 
Note | tag and data) in a TSB directly, without using software 
interlocks. The "real address" variants can perform the access 
using a real address, bypassing the VA-to-RA translation. 


The virtual processor MMU does not provide virtual-to-real translation for ASIs 2646 
and 2E;6; the effective address provided with either of those ASIs is interpreted 
directly as a real address. 


Compatibility | ASIs 2736, 2F46, 2616, and 2E46 are now standard ASIs that 
Note | replace (respectively) ASIs 2446, 2C16, 3416, and 3C46 that were 
supported in some previous UltraSPARC implementations. 


A mem addaress not aligned trap is taken if the access is not aligned on a 128-byte 
boundary. 
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Implementation | LDTXA shares an opcode with the “i = 0” variant of the 
Note | (deprecated) LDTWA instruction; they are differentiated by the 
combination of the value of “i” and the ASI used in the 
instruction. See Load Integer Twin Word from Alternate Space on 
page 252. 


Exceptions illegal instruction 
mem address not aligned 
privileged action 
VA watchpoint (impl. dep. 1413-510) 
data access exception 


See Also LDTWA on page 252 
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7.61 Load Floating-Point State Register 


Instruction op3 rd Operation Assembly Language Syntax Class 
10 0001 0 (see page 243) 
LDXFSR 100001 1 Load Floating-Point State Register ldx [address], sfsr A1 


— 10 0001 2-31 Reserved 





209 S — 2 


31 30 29 25 24 19 18 14 13 12 5 4 0 


Description A load floating-point state register instruction (LDXFSR) waits for all FPop 
instructions that have not finished execution to complete and then loads a 
doubleword from memory into the FSR. 


LDXFSR does not alter the ver, ftt, qne, reserved, or unimplemented (for example, 
ns) fields of FSR (see page 58). 


Programming | For future compatibility, software should only issue an LDXFSR 
Note | instruction with a zero value (or a value previously read from 
the same field) written into any reserved field of FSR. 


LDXFSR accesses memory using the implicit ASI (see page 104). 


If i = 0, the effective address for these instructions is “R[rs1] + R[rs2]" and if i = 0, 
the effective address is "R[rs1] + sign ext (simm13)". 


Exceptions. An attempt to execute an instruction encoded as op = 2 and op3 = 2146 
when any of the following conditions exist causes an illegal instruction exception: 


m i=0 and instruction bits 12:5 are nonzero 
m (rd>1) 


If the FPU is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if no FPU is present, 
an attempt to execute an LDXFSR instruction causes an fp disabled exception. 


If the effective address is not doubleword-aligned, an attempt to execute an LDXFSR 


instruction causes a mem address not aligned exception. 


Destination Register(s) when Exception Occurs. If a load floating-point state 
register instruction generates an exception that causes a precise trap, the destination 
register (FSR) remains unchanged. 
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See Also 


LDTXA 


IMPL. DEP. #44-V8-Cs10(a)(2): If an LDXFSR instruction generates an exception 
that causes a non-precise trap, it is implementation dependent whether the contents 
of the destination register (FSR) is undefined or is guaranteed to remain unchanged. 


Implementation | LDXFSR shares an opcode with the (deprecated) LDFSR 
Note | instruction (and possibly with other implementation-dependent 
instructions); they are differentiated by the instruction rd field. 
An attempt to execute the op = 115, 0p3 = 10 0001, opcode with 
an invalid rd value causes an illegal_instruction exception. 


illegal instruction 

fo disabled 

mem address not aligned 
VA watchpoint 

data access exception 


Load Floating-Point Register on page 236 
Load Floating-Point State Register (Lower) on page 243 
Store Floating-Point State Register on page 339 
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MEMBAR 





7.62 | Memory Barrier 


Instruction op3 Operation Assembly Language Syntax Class 


MEMBAR 10 1000 Memory Barrier membar membar_mask A1 


Se SENE ERIT COUR B 


31 30 29 25 24 19 18 14 138 12 





Description The memory barrier instruction, MEMBAR, has two complementary functions: to 
express order constraints between memory references and to provide explicit control 
of memory-reference completion. The membar mask field in the suggested assembly 
language is the concatenation of the cmask and mmask instruction fields. 


MEMBAR introduces an order constraint between classes of memory references 
appearing before the MEMBAR and memory references following it in a program. 
The particular classes of memory references are specified by the mmask field. 
Memory references are classified as loads (including load instructions LDSTUB[A], 
SWAP[A], CASA, and CASX[A] and stores (including store instructions LDSTUB[A], 
SWAP[A], CASA, CASXA, and FLUSH). The mmask field specifies the classes of 
memory references subject to ordering, as described below. MEMBAR applies to all 
memory operations in all address spaces referenced by the issuing virtual processor, 
but it has no effect on memory references by other virtual processors. When the 
cmask field is nonzero, completion as well as order constraints are imposed, and the 
order imposed can be more stringent than that specifiable by the mmask field alone. 


A load has been performed when the value loaded has been transmitted from 
memory and cannot be modified by another virtual processor. A store has been 
performed when the value stored has become visible, that is, when the previous 
value can no longer be read by any virtual processor. In specifying the effect of 
MEMBAR, instructions are considered to be executed as if they were processed in a 
strictly sequential fashion, with each instruction completed before the next has 
begun. 


The mmask field is encoded in bits 3 through 0 of the instruction. TABLE 7-7 specifies 
the order constraint that each bit of mmask (selected when set to 1) imposes on 
memory references appearing before and after the MEMBAR. From zero to four 
mask bits may be selected in the mmask field. 
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TABLE 7-7 MEMBAR mmask Encodings 














Assembly 

Mask Bit Language Name Description 

mmask{3} StoreStore The effects of all stores appearing prior to the MEMBAR 
instruction must be visible to all virtual processors before the 
effect of any stores following the MEMBAR. 

mmask{2} LoadStore All loads appearing prior to the MEMBAR instruction must 
have been performed before the effects of any stores following 
the MEMBAR are visible to any other virtual processor. 

mmask{1} StoreLoad The effects of all stores appearing prior to the MEMBAR 
instruction must be visible to all virtual processors before loads 
following the MEMBAR may be performed. 

mmask{0} LoadLoad All loads appearing prior to the MEMBAR instruction must 


have been performed before any loads following the MEMBAR 
may be performed. 


The cmask field is encoded in bits 6 through 4 of the instruction. Bits in the cmask 
field, described in TABLE 7-8, specify additional constraints on the order of memory 
references and the processing of instructions. If cmask is zero, then MEMBAR 
enforces the partial ordering specified by the mmask field; if cmask is nonzero, then 
completion and partial order constraints are applied. 


TABLE 7-86 MEMBAR cmask Encodings 


Mask Bit Function 


cmask{2} Synchronization 
barrier 


cmask{1} Memory issue 
barrier 


cmask{0} Lookaside barrier 


Assembly 
Language Name Description 


#Sync All operations (including nonmemory 
reference operations) appearing prior to the 
MEMBAR must have been performed and 
the effects of any exceptions be visible before 
any instruction after the MEMBAR may be 
initiated. 


#MemIssue All memory reference operations appearing 
prior to the MEMBAR must have been 
performed before any memory operation 
after the MEMBAR may be initiated. 


#Lookaside A store appearing prior to the MEMBAR 
must complete before any load following the 
MEMBAR referencing the same address can 
be initiated. 


A MEMBAR instruction with both mmask = 0 and cmask = 0 is functionally a NOP. 


For information on the use of MEMBAR, see Memory Ordering and Synchronization on 
page 393 and Programming with the Memory Models contained in the separate volume 
UltraSPARC Architecture Application Notes. For additional information about the 
memory models themselves, see Chapter 9, Memory. 
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The coherence and atomicity of memory operations between virtual processors and 
I/O DMA memory accesses are implementation dependent (impl. dep. #120-V9). 


MEMBAR with mmask = 81; and cmask = 0416 (MEMBAR 
#StoreStore) is identical in function to the SPARC V8 STBAR 
instruction, which is deprecated. 


V9 Compatibility 
Note 





An attempt to execute a MEMBAR instruction when instruction bits 12:7 are nonzero 
causes an illegal_instruction exception. 


Implementation 
Note 


MEMBAR shares an opcode withRDasr; it is distinguished by 
rs1 = 15, rd = 0, i = 1, and bit 12 = 0. 





7.62.1 Memory Synchronization 


The UltraSPARC Architecture provides some level of software control over memory 
synchronization, through use of the MEMBAR and FLUSH instructions for explicit 
control of memory ordering in program execution. 


IMPL. DEP. #412-S10: An UltraSPARC Architecture implementation may define the 
operation of each MEMBAR variant in any manner that provides the required 
semantics. 


Implementation | For an UltraSPARC Architecture virtual processor that only 

Note | provides TSO memory ordering semantics, three of the ordering 
MEMBARs would normally be implemented as NOPs. TABLE 7-9 
shows an acceptable implementation of MEMBAR for a TSO- 
only UltraSPARC Architecture implementation. 


TABLE 7-9 MEMBAR Semantics for TSO-only implementation 











MEMBAR variant Preferred Implementation 
StoreStore NOP 
LoadStore NOP 
StoreLoad #Sync 
LoadLoad NOP 
Sync #Sync 
MemIssue #Sync 
Lookaside #Sync 








If an UltraSPARC Architecture implementation provides a less 
restrictive memory model than TSO (for example, RMO), the 
implementation of the MEMBAR variants may be different. See 
implementation-specific documentation for details. 
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7.62.2 


7.62.3 


Exceptions 


MEMBAR 


Synchronization of the Virtual Processor 


Synchronization of a virtual processor forces all outstanding instructions to be 
completed and any associated hardware errors to be detected and reported before 
any instruction after the synchronizing instruction is issued. 


Synchronization can be explicitly caused by executing a synchronizing MEMBAR 
instruction (MEMBAR #Sync) or by executing an LDXA/STXA/LDDFA/STDFA 
instruction with an ASI that forces synchronization. 


Programming | Completion of a MEMBAR 4 Sync instruction does not 
Note | guarantee that data previously stored has been written all the 
way out to external memory. Software cannot rely on that 
behavior. There is no mechanism in the UltraSPARC 
Architecture that allows software to wait for all previous stores 
to be written to external memory. 


TSO Ordering Rules affecting Use of MEMBAR 


For detailed rules on use of MEMBAR to enable software to adhere to the ordering 
rules on a virtual processor running with the TSO memory model, refer to TSO 
Ordering Rules on page 390. 


illegal instruction 
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7.63 


For Integer Condition Codes 


MOVcc 


Move Integer Register on Condition 
(MOVcc) 








Instruction op3 cond Operation icc / xcc Test Assembly Language Syntax Class 
MOVA 101100 1000 Move Always 1 mova i or x cc, reg or immll, regyg A1 
MOVN 101100 0000 Move Never 0 movn i or x cc, reg or immll, reg;g A1 
MOVNE 101100 1001 Move if Not Equal notZ movne'! i or x cc, reg or imml11, reg,g A1 
MOVE 101100 0001 Move if Equal Z movet ior x cc, reg or imm11, reg;g A1 
MOVG 101100 1010 Move if Greater not (Z or movg i or x cc, reg or immll, regrg A1 
N xor V)) 
MOVLE 101100 0010 Move if Less or Zor(NxorV) movle ior x cc, reg or immll, reg; A1 
Equal 
MOVGE 101100 1011 Move if Greater not (N xor V) movge i or x cc, reg or immll, reg, A1 
or Equal 
MOVL 101100 0011 Move if Less N xor V movl i or x cc, reg_or_imm11, regra A1 
MOVGU 101100 1100 Move if Greater, not (C or Z) movgu i or x cc, reg or immll, regyg A1 
Unsigned 
MOVLEU 101100 0100 Move if Less or (C or Z) movleu i or x cc, reg or immll, regrg Al 
Equal, Unsigned 
MOVCC 101100 1101 Move if Carry not C movec® i or x cc, reg or imm11, regjg A1 
Clear (Greater or 
Equal, Unsigned) 
MOVCS 101100 0101 Move if Carry Set C moves” i or x cc, reg_or_imm11, reg, A1 
(Less than, 
Unsigned) 
MOVPOS 101100 1110 Move if Positive not N movpos i or x cc, reg_or_imm11, reg,g A1 
MOVNEG 101100 0110 Move if Negative N movneg i or x cc, reg or immll, reg;g A1 
MOVVC 101100 1111 Move if Overflow not V movvc i or x cc, reg or immll, reg;g A1 
Clear 
MOVVS 101100 0111 Move if Overflow V movvs i or x cc, reg or immll, regrg A1 








Set 





synonym: movnz 


t synonym: movz 


? synonym: movgeu 


v synonym: movlu 


Programming | In assembly language, to select the appropriate condition code, 
Note | include icc or %xcc before the reg or imm11 field. 
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For Floating-Point Condition Codes 











Instruction op3 cond Operation fcc Test Assembly Language Syntax Class 

MOVFA 101100 1000 Move Always 1 mova $fccn, reg or immll, regra A1 

MOVEN 101100 0000 Move Never 0 movn $fccn, reg or immll, regra A1 

MOVFU 101100 0111 Move if Unordered U movu $fccn, reg or immll, regra A1 

MOVFG 101100 0110 Move if Greater G movg $fccn, reg or immll, regra A1 

MOVFUG 101100 0101 Move if Unordered Gor U movug %fccn, reg or immll, regra A1 
or Greater 

MOVFL 101100 0100 Move if Less L movl $fccn, reg_or_imm11, regra A1 

MOVFUL 101100 0011 Move if Unordered L or U movul %fccn, reg or immll, regra A1 
or Less 

MOVFLG 101100 0010 Move if Less or LorG movlg %fccn, reg or immll, regra A1 
Greater 

MOVFNE 101100 0001 Move if Not Equal LorGorU movnet $fccn, reg_or_immil, regra A1 

MOVFE 101100 1001 Move if Equal E movet $fccn, reg or immll, regra A1 

MOVFUE 101100 1010 Move if Unordered E or U movue %fccn, reg or immll, regra A1 
or Equal 

MOVFGE 101100 1011 Move ifGreater or EorG movge %fccn, reg or immll, regra A1 
Equal 

MOVFUGE 101100 1100 Move if Unordered EorG or U movuge $fccn, reg or immll, reg;q A1 
or Greater or Equal 

MOVFLE 101100 1101 Move if Less or EorL movle %fccn, reg or immll, regra A1 
Equal 

MOVFULE 101100 1110 Move if Unordered EorLorU movule %fccn, reg or immll, reg;g A1 
or Less or Equal 





MOVFO 101100 1111 Move if Ordered EorLorG movo $fccn, reg or immll, regra A1 
E synonym: movnz t synonym: movz 
Programming | In assembly language, to select the appropriate condition code, 


Note | include $£cc0, $£cc1, $£cc2, or $£cc3 before the reg or imm11 
field. 


AT 5 —FRowISERBD — M 
IL e e e m —] 


31 30 29 25 24 19 18 17 14 13 12 11 10 
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cc2 cci ccd Condition Code 





0 0 0 fccO 

0 0 1 fect 

0 1 0 fcc2 

0 1 1 fec3 

1 0 0 icc 

1 0 1 Reserved (illegal_instruction) 
1 1 0 xoc 

1 1 1 Reserved (illegal_instruction) 








Description These instructions test to see if cond is TRUE for the selected condition codes. If so, 
they copy the value in R[rs2] if i field = 0, or “sign_ext(simm11)” if i = 1 into R[rd]. 
The condition code used is specified by the cc2, cc1, and cc0 fields of the 
instruction. If the condition is FALSE, then R[rd] is not changed. 





These instructions copy an integer register to another integer register if the condition 
is TRUE. The condition code that is used to determine whether the move will occur 
can be either integer condition code (icc or xcc) or any floating-point condition code 
(fccO, fcc1, fcc2, or fcc3). 





These instructions do not modify any condition codes. 


Programming | Branches cause the performance of many implementations to 

Note | degrade significantly. Frequently, the MOVcc and FMOVcc 
instructions can be used to avoid branches. For example, the C 
language if-then-else statement 


if (A > B) then X = 1; lse X = 0; 





can be coded as 


cmp $i0,%i2 

bg,a $xcc,label 

or $g0,1,9i3! X = 1 
Or $g0,0,9i3! X = 0 


label:... 


The above sequence requires four instructions, including a branch. 
With MOVcc this could be coded as: 


cmp $i10,£i2 
or $5g0,1,%i3! assume X = 1 
movle %xcc,0,%i3! overwrite with X = 0 


This approach takes only three instructions and no branches and 
may boost performance significantly. Use MOVcc and FMOVcc 
instead of branches wherever these instructions would increase 
performance. 





An attempt to execute a MOVcc instruction when either instruction bits 10:5 are 
nonzero or (CC2 :: cc1 :: cc0) = 101, or 111, causes an illegal_instruction exception. 
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If cc2 = 0 (that is, a floating-point condition code is being referenced in the MOVcc 
instructions) and either the FPU is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if 


no FPU is present, an attempt to execute a MOVcc instruction causes an fp. disabled 
exception. 


Exceptions illegal instruction 
fo disabled 
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7.64 Move Integer Register on Register 
Condition (MOVr) 





Instruction op3 rcond Operation Test Assembly Language Syntax Class 
— 101111 000 Reserved (illegal instruction) — 


MOVRZ 101111 001 Move if Register Zero Rirst}=0  movrz!  regyg,, reg or imm10, regjg A1 


MOVRLEZ 101111 010 Move if Register Less R[rsí] S0 movrlez reg,g,, reg or imm10, regrg A1 
Than or Equal to Zero 


MOVRLZ 101111 011 Move if Register Less Rfrsi]<0  movrlz  reégyg,, reg or imm10, regjg A1 
Than Zero 


— 101111 100 Reserved (illegal instruction) — 


MOVRNZ 101111 101 Move if Register Not R[rst]#0 movrnz? reg,,, reg or imm10, reg,g A1 
Zero 


MOVRGZ 101111 110 Move if Register R[rst]>0  movrgz reg,ss, reg or imm10, regjg A1 
Greater Than Zero 


MOVRGEZ 10111 111 Move if Register R[rs]20  movrgez reg,ss, reg or imm10, regrg A1 
Greater Than or Equal 
to Zero 











t synonym: movre t synonym: movrne 


10 rd op3 rs1 rcond rs2 
[29 (| 


31 30 29 25 24 19 18 14 13 12 109 5 4 0 


Description If the contents of integer register R[rs1] satisfy the condition specified in the rcond 
field, these instructions copy their second operand (if i = 0, R[rs2]; if i= 1, 
sign_ext(simm10)) into R[rd]. If the contents of R[rs1] do not satisfy the condition, 
then R[rd] is not modified. 


These instructions treat the register contents as a signed integer value; they do not 
modify any condition codes. 


Programming | The MOVr instructions are “64-bit-only” instructions; there is no 
Note | version of these instructions that operates on just the less- 
significant 32 bits of their source operands. 
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Implementation | If this instruction is implemented by tagging each register value 
Note | with an n (negative) and a z (zero) bit, use the table below to 
determine if rcond is TRUE. 





Move Test 
MOVRNZ not Z 
MOVRZ Z 
MOVRGEZ not N 
MOVRLZ N 
MOVRLEZ NorZ 
MOVRGZ N nor Z 


An attempt to execute a MOVr instruction when either instruction bits 9:5 are 
nonzero or rcond = 0005 or 100; causes an illegal instruction exception. 


Exceptions illegal instruction 
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MULScc - Deprecated 





7.65 


Multiply Step 


The MULScc instruction is deprecated and should not be used in new software. 
The MUIX instruction should be used instead. 





Opcode op3 


Operation Assembly Language Syntax Class 





MULScc- 100100 Multiply Step and modify cc’s mulscc  fegysir leg or imum, regyg Y3 





a =  — 
D LG T US EE m9 


31 30 29 


Description 


25 24 19 18 14 13 12 


MULScc treats the less-significant 32 bits of R[rs1] and the less-significant 32 bits of 
the Y register as a single 64-bit, right-shiftable doubleword register. The least 
significant bit of R[rs1] is treated as if it were adjacent to bit 31 of the Y register. The 
MULScc instruction performs an addition operation, based on the least significant 
bit of Y. 


Multiplication assumes that the Y register initially contains the multiplier, R[rs1] 
contains the most significant bits of the product, and R[rs2] contains the 
multiplicand. Upon completion of the multiplication, the Y register contains the least 
significant bits of the product. 


Note | In a standard MULScc instruction, rs = rd. 


MULScc operates as follows: 

1. If i = 0, the multiplicand is R[rs2]; if i = 1, the multiplicand is sign ext (simm13). 

2. A 32-bit value is computed by shifting the value from R[rs1] right by one bit with 
“CCR.icc.n xor CCR.icc.v” replacing bit 31 of R[rs1]. (This is the proper sign for 
the previous partial product.) 

3. If the least significant bit of Y = 1, the shifted value from step (2) and the 


multiplicand are added. If the least significant bit of the Y = 0, then 0 is added to 
the shifted value from step (2). 
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MULScc - Deprecated 


4. MULScc writes the following result values: 








Register field Value written by MULScc 

CCR.icc updated according to the result of the addition in step (3) 
above 

R[rd]{63:32} undefined 

R[rd]{31:0} the least-significant 32 bits of the sum from step (3) above 

Y the previous value of the Y register, shifted right by one 


bit, with Y{31} replaced by the value of R[rs1](0] prior to 
shifting in step (2) 


CCR.xcc undefined 





5. The Y register is shifted right by one bit, with the least significant bit of the 
unshifted R[rs1] replacing bit 31 of Y. 


An attempt to execute a MULScc instruction when i = 0 and instruction bits 12:5 are 
nonzero causes an illegal instruction exception. 


Exceptions illegal instruction 


See Also RDY on page 287 
SDIV, SDIVcc on page 304 
SMUL, SMULcc on page 311 
UDIV, UDIVcc on page 354 
UMUL, UMULcc on page 356 
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MULX / SDIVX / UDIVX 





7.66 


Multiply and Divide (64-bit) 





Instruction op3 Operation Assembly Language Class 
MULX 00 1001 Multiply (signed or unsigned) mulx Tégrg1, Te Or imm, Tegra A1 
SDIVX 10 1101 Signed Divide sdivx Tégrg1, leg Or imm, Tegra A1 
UDIVX 00 1101 Unsigned Divide udivx Tégrg1, l'eg Or imm, Tr A1 





Description 


Exceptions 


25 24 19 18 14 18 12 5 4 0 


MULX computes "R[rs1] x R[rs2]" if i = 0 or "R[rs1] x sign ext (simm13)" if i = 1, 
and writes the 64-bit product into R[rd]. MULX can be used to calculate the 64-bit 
product for signed or unsigned operands (the product is the same). 


SDIVX and UDIVX compute “R[rs1] + R[rs2]" if i = 0 or 

^R[rs1] + sign ext (simm13)" if i = 1, and write the 64-bit result into R[rd]. SDIVX 
operates on the operands as signed integers and produces a corresponding signed 
result. UDIVX operates on the operands as unsigned integers and produces a 
corresponding unsigned result. 


For SDIVX, if the largest negative number is divided by -1, the result should be the 
largest negative number. That is: 


8000 0000 0000 000046 + FFFF FFFF FFFF FFFF16 = 8000 0000 0000 00004. 
These instructions do not modify any condition codes. 
An attempt to execute a MULX, SDIVX, or UDIVX instruction when i = 0 and 


instruction bits 12:5 are nonzero causes an illegal instruction exception. 


illegal instruction 
division by zero 
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NOP 





7.67 — No Operation 


Instruction op2 Operation Assembly Language Syntax Class 


NOP 100 No Operation nop Al 





|00|rd-00000 | op? | imm22-0000000000000000000000 


31 30 29 25 24 22 21 0 


Description The NOP instruction changes no program-visible state (except that of the PC 
register). 
NOP is a special case of the SETHI instruction, with imm22 = 0 and rd = 0. 


Programming | There are many other opcodes that may execute as NOPs; 
Note | however, this dedicated NOP instruction is the only one 
guaranteed to be implemented efficiently across all 
implementations. 


Exceptions None 
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NORMALW 





7.68 NORMALW 


Instruction Operation Assembly Language Syntax Class 


NORMALWP “Other” register windows become "normal" register windows normalw A1 
g g 


EE OE TL] 


31 30 29 25 24 19 18 0 





Description ^^ NORMALWP is a privileged instruction that copies the value of the OTHERWIN 
register to the CANRESTORE register, then sets the OTHERWIN register to zero. 


Programming | The NORMALW instruction is used when changing address 
Notes | spaces. NORMALW indicates the current "other" windows are 
now "normal" windows and should use the spill n normal and 
fill n normal traps when they generate a trap due to window spill 
or fill exceptions. The window state may become inconsistent if 
NORMALW is used when CANRESTORE is nonzero. 





In an UltraSPARC Architecture 2005 implementation, this instruction is not 
implemented in hardware, causes an illegal instruction exception, and is emulated in 


software. 
Exceptions illegal instruction (not implemented in hardware in UltraSPARC Architecture 2005) 
See Also ALLCLEAN on page 136 


INVALW on page 225 
OTHERW on page 276 
RESTORED on page 294 
SAVED on page 302 
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OR 





7.69 OR Logical Operation 





Instruction  op3 Operation Assembly Language Syntax Class 
OR 00 0010 Inclusive or or Tégrg1, leg Or imm, Te&rg A1 
ORcc 01 0010 Inclusive or and modify cc's orcc Tégyg1, TEeg Or imm, Tegyq A1 
ORN 00 0110 Inclusive or not orn Tégrg1, TE€g Or imm, Tera A1 
ORNcc 01 0110 Inclusive or not and modify cc's  orncc egy, reg or imm, reg A1 





TE 


3 US mms 
31 30 29 25 24 19 18 14 13 12 5 4 0 


Description These instructions implement bitwise logical or operations. They compute “R[rs1] 
op R[rs2]" if i= 0, or "R[rs1] op sign. ext (simm13)" if i = 1, and write the result into 
R[rd]. 


ORcc and ORNcc modify the integer condition codes (icc and xcc). They set the 
condition codes as follows: 


icc.v, icc.c, XCC.V, and xcc.c are set to 0 

icc.n is copied from bit 31 of the result 

xcc.n is copied from bit 63 of the result 

icc.z is set to 1 if bits 31:0 of the result are zero (otherwise to 0) 
xcc.z is set to 1 if all 64 bits of the result are zero (otherwise to 0) 


ORN and ORNcc logically negate their second operand before applying the main 
(or) operation. 


An attempt to execute an OR[N][cc] instruction when i = 0 and instruction bits 12:5 
are nonzero causes an /llegal instruction exception. 


Exceptions illegal instruction 
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OTHERW 





7.70 OTHERW 


Instruction Operation Assembly Language Syntax Class 


OTHERW? “Normal” register windows become "other" otherw A1 
register windows 





ES EL ES 


31 30 29 25 24 19 18 0 


Description OTHERW! is a privileged instruction that copies the value of the CANRESTORE 
register to the OTHERWIN register, then sets the CANRESTORE register to zero. 


Programming | The OTHERW instruction is used when changing address spaces. 

Notes | OTHERW indicates the current "normal" register windows are 
now "other" register windows and should use the spill! n other 
and fill n other traps when they generate a trap due to window 
spill or fill exceptions. The window state may become inconsistent 
if OTHERW is used when OTHERWIN is nonzero. 





In an UltraSPARC Architecture 2005 implementation, this instruction is not 
implemented in hardware, causes an illegal instruction exception, and is emulated in 


software. 
Exceptions illegal instruction (not implemented in hardware in UltraSPARC Architecture 2005) 
See Also ALLCLEAN on page 136 


INVALW on page 225 
NORMALW on page 274 
RESTORED on page 294 
SAVED on page 302 
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PDIST 





7.71 


Pixel Component Distance 
(with Accumulation) [vis7] 








Instruction opf 


PDIST 0 0011 


31 30 29 


Description 


Exceptions 


Operation Assembly Language Syntax Class 


1110 Distance between eight 8-bit components, pdist fregrsir frerso, fregra C3 
with accumulation 


110110 rs] opf rs2 
25 24 19 18 14 13 5 4 0 


Eight unsigned 8-bit values are contained in the 64-bit floating-point source registers 
Fp[rs1] and Fp[rs2]. The corresponding 8-bit values in the source registers are 
subtracted (that is, each byte in Fp[rs2] is subtracted from the corresponding byte in 
Fp[rs1]). The sum of the absolute value of each difference is added to the integer in 
Fp[rd] and the resulting integer sum is stored in the destination register, Fp[rd]. 


Programming | PDIST uses Fp[rd] as both a source and a destination register. 


Notes Typically, PDIST is used for motion estimation in video 


compression algorithms. 


In an UltraSPARC Architecture 2005 implementation, this instruction is not 
implemented in hardware, causes an illegal_instruction exception, and is emulated in 
software. 


illegal_instruction 
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POPC 





7.72 Population Count 


Instruction op3 Operation 


POPC 10 1110 Population Count pope reg or imm, TeSrd 


Assembly Language Syntax Class 
C3 





E Ux vo Ta[ — — — — 1 
N — — smi — —] 


31 30 29 25 24 19 18 14 13 12 5 4 0 


Description POPC counts the number of one bits in R[rs2] if i = 0, or the number of one bits in 
sign_ext (simm13) if i= 1, and stores the count in R[rd]. This instruction does not 


modify the condition codes. 
V9 Compatibility | Instruction bits 18 through 14 must be zero for POPC. Other 
Note | encodings of this field (rs1) may be used in future versions of the 
SPARC architecture for other instructions. 


Programming | POPC can be used to "find first bit set" in a register. A ‘C’- 
Note | language program illustrating how POPC can be used for this 


purpose follows: 


int ffs(zz)/* finds first 1 bit, counting from the LSB */ 
unsigned zz; 


( 
return pope ( zz ^ (~ (-zz)));/* for nonzero zz */ 


} 
Inline assembly language code for ffs () is: 


neg SIN, %M_IN ! -zz(2's complement) 

xnor SIN, $M IN, *TEMP ! ^ ~ -zz (exclusive nor) 

popc STEMP, SRESULT ! result = popc(zz ^ ~ -zz) 
movrz %IN,%g0,%*RESULT ! SRESULT should be 0 for %IN=0 


where IN, M. IN, TEMP, and RESULT are integer registers. 


Example computation: 
400101000 ilst VIT bit from right is 





IN - 
-IN = ...11011000 ! bit 3 (4th bit) 
~ -IN = ...00100111 
IN ^ ~ -IN = ...00001111 
popc (IN ^ ~ -IN = 4 
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Exceptions 


POPC 


Programming | POPC can be used to “centrifuge” all the ‘1’ bits in a register to the 
Note | least significant end of a destination register. Assembly-language 
code illustrating how POPC can be used for this purpose follows: 


pope SIN, %DEST 


cmp SIN, -1 ! Test for pattern of all 1’s 
mov -1, sTEMP ! Constant -1 -> temp register 
sllx sTEMP,%DEST,%DEST ! (shift count of 64 same as 0) 
not SDEST ! 

movcc $£xcc, -1, %DEST ! If src was -1, result is -1 


where IN, TEMP, and DEST are integer registers. 


Programming | POPC is a "64-bit-only" instruction; there is no version of this 
Note | instruction that operates on just the less-significant 32 bits of its 
source operand. 





In an UltraSPARC Architecture 2005 implementation, this instruction is not 
implemented in hardware, causes an illegal instruction exception, and is emulated in 
software. 


An attempt to execute a POPC instruction when either instruction bits 18:14 are 
nonzero, or i = 0 and instruction bits 12:5 are nonzero causes an /llegal instruction 
exception. 


illegal instruction 
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7.79 Prefetch 


PREFETCH 





Instruction op3 Operation Assembly Language Syntax Class 
PREFETCH 101101  Prefetch Data prefetch [address], prefetch fcn A1 
PREFETCHAPAS 111101 Prefetch Data from  prefetcha [regaddr] imm asi, prefetch fcn A1 
Alternate Space prefetcha [reg plus imm] $asi,prefetch fcn 
PREFETCH 
E sd EL ———IL-5 
fcn op3 rst i=1 simm13 
31 30 29 25 24 19 18 14 13 12 5 4 0 
PREFETCHA 
rs2 


fcn op3 
P3 


19 18 14 13 12 5 


TABLE 7-10 Prefetch Variants, by Function Code 


4 





fen Prefetch Variant 

0 (Weak) Prefetch for several reads 

1 (Weak) Prefetch for one read 

2 (Weak) Prefetch for several writes and possibly reads 
3 (Weak) Prefetch for one write 

4 Prefetch page 

5-15 (0516-0F16) Reserved (illegal_instruction) 

16 (10419 Implementation dependent (NOP if not implemented) 
17 (1149 Prefetch to nearest unified cache 

18-19 (1216-1316) | Implementation dependent (NOP if not implemented) 
20 (1446) Strong Prefetch for several reads 

21 (1546) Strong Prefetch for one read 

22 (1616) Strong Prefetch for several writes and possibly reads 
23 (1746) Strong Prefetch for one write 


24-31 (1816-1F16) 


Implementation dependent (NOP if not implemented) 
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o 


Description 


PREFETCH 


A PREFETCH[A] instruction provides a hint to the virtual processor that software 
expects to access a particular address in memory in the near future, so that the 
virtual processor may take action to reduce the latency of accesses near that address. 
Typically, execution of a prefetch instruction initiates movement of a block of data 
containing the addressed byte from memory toward the virtual processor or creates 
an address mapping. 


Implementation | A PREFETCH[A] instruction may be used by software to: 


Note | prefetch a cache line into a cache 


* prefetch a valid address translation into a TLB 
e 


If i = 0, the effective address operand for the PREFETCH instruction is 
^R[rs1] + R[rs2]"; if i= 1, it is “R[rs1] + sign ext (simm13)". 


PREFETCH instructions access the primary address space 
(ASI PRIMARY[ LITTLE]). 





PREFETCHA instructions access an alternate address space. If i = 0, the address 
space identifier (ASI) to be used for the instruction is in the imm asi field. If i = 1, the 
ASI is found in the ASI register. 


A prefetch operates much the same as a regular load operation, but with certain 
important differences. In particular, a PREFETCH[A] instruction is non-blocking; 
subsequent instructions can continue to execute while the prefetch is in progress. 


When executed in nonprivileged or privileged mode, PREFETCH[A] has the same 
observable effect as a NOP. A prefetch instruction will not cause a trap if applied to 
an illegal or nonexistent memory address. (impl. dep. #103-V9-Ms10(e)) 


IMPL. DEP. #103-V9-Ms10(a): The size and alignment in memory of the data block 
prefetched is implementation dependent; the minimum size is 64 bytes and the 
minimum alignment is a 64-byte boundary. 
Programming | Software may prefetch 64 bytes beginning at an arbitrary address 
Note | address by issuing the instructions 


prefetch [address], prefetch fcn 
prefetch [address + 63], prefetch fcn 


Variants of the prefetch instruction can be used to prepare the memory system for 
different types of accesses. 


IMPL. DEP. #103-V9-Ms10(b): An implementation may implement none, some, or 
all of the defined PREFETCH[A] variants. It is implementation-dependent whether 
each variant is (1) not implemented and executes as a NOP, (2) is implemented and 
supports the full semantics for that variant, or (3) is implemented and only supports 
the simple common-case prefetching semantics for that variant. 
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PREFETCH 
7.73.1 Exceptions 


Prefetch instructions PREFETCH and PREFETCHA generate exceptions under the 
conditions detailed in TABLE 7-11. Only the implementation-dependent prefetch 
variants (see TABLE 7-10) may generate an exception under conditions not listed in 
this table; the predefined variants only generate the exceptions listed here. 


TABLE 7-11 Behavior of PREFETCHI[A] Instructions Under Exceptional Conditions 





fen Instruction Condition Result 

any PREFETCH i= 0 and instruction bits 12:5 are illegal instruction 
nonzero 

any PREFETCHA reference to an ASI in the range executes as NOP 


016-7F16, while in nonprivileged 
mode (privileged action condition) 


any PREFETCHA reference to an ASI in range executes as NOP 
3016--7F16, while in privileged 
mode (privileged action condition) 


0-3 PREFETCH[A] condition detected for MMU miss  executes as NOP 
(weak) 

0-4 PREFETCH[A] variant unimplemented executes as NOP 
0-4 PREFETCHA reference to an invalid ASI executes as NOP 


(ASI not listed in following table) 
0-4, 17, PREFETCH[A] condition detected for ((TTE.cp = 0) executes as NOP 








20-23 or ((fcn = 0) and TTE.cv = 0)), or 

(TTE.e - 1) 
4,20-23 | PREFETCH[A] prefetching the requested data executes as NOP 
(strong) would be a very time-consuming 

operation 
5-15 PREFETCH[A] (always) illegal instruction 
(05:6-0F16) 
16-31 PREFETCH[A] variant unimplemented executes as NOP 


ASIs valid for PREFETCHA (all others are invalid) 








ASI NUCLEUS ASI NUCLEUS LITTLE 

ASI AS IF USER PRIMARY ASI AS IF USER PRIMARY LITTLE 
ASI AS IF USER SECONDARY ASI AS IF USER SECONDARY LITTLE 
ASI PRIMARY ASI PRIMARY LITTLE 

ASI SECONDARY ASI SECONDARY LITTLE 

ASI PRIMARY NO FAULT ASI PRIMARY NO FAULT LITTLE 

ASI SECONDARY NO, FAULT ASI SECONDARY NO FAULT LITTLE 
ASI REAL ASI REAL LITTLE 
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Tide 


7.73.3 


PREFETCH 


Weak versus Strong Prefetches 


Some prefetch variants are available in two versions, “Weak” and “Strong”. 


From software’s perspective, the difference between the two is the degree of 
certainty that the data being prefetched will subsequently be accessed. That, in 
turn, affects the amount of effort (time) it’s willing for the underlying hardware to 
invest to perform the prefetch. If the prefetch is speculative (software believes the 
data will probably be needed, but isn’t sure), a Weak prefetch will initiate data 
movement if the operation can be performed quickly, but abort the prefetch and 
behave like a NOP if it turns out that performing the full prefetch will be time- 
consuming. If software has very high confidence that data being prefetched will 
subsequently be accessed, then a Strong prefetch requests that the prefetch operation 
will continue, even if the prefetch operation does become time-consuming. 


From the virtual processor’s perspective, the difference between a Weak and a 
Strong prefetch is whether the prefetch is allowed to perform a time-consuming 
operation in order to complete. If a time-consuming operation is required, a Weak 
prefetch will abandon the operation and behave like a NOP while a Strong prefetch 
may pay the cost of performing the time-consuming operation so it can finish 
initiating the requested data movement. Behavioral differences among loads and 
prefetches are compared in TABLE 7-12. 


TABLE 7-12 Comparative Behavior of Load and Weak Prefetch Operations 





Behavior 





Condition Load Prefetch 


Upon detection of privileged action, data access exception Traps NOP 
or VA watchpoint exception... 


If page table entry has cp = 0, e = 1, and cv = 0 for Prefetch for Traps  NOPH 
Several Reads 


If page table entry has nfo — 1 for a non-NoFault access... Traps NOP} 


If page table entry has w = 0 for any prefetch for write access Traps NOP 
(fcn = 2, 3, 22, or 23)... 
Instruction blocks until cache line filled? Yes No 








Prefetch Variants 


The prefetch variant is selected by the fcn field of the instruction. fcn values 5-15 are 
reserved for future extensions of the architecture, and PREFETCH fcn values of 16- 
19 and 24-31 are implementation dependent in UltraSPARC Architecture 2005. 
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PREFETCH 


Each prefetch variant reflects an intent on the part of the compiler or programmer, a 
“hint” to the underlying virtual processor. This is different from other instructions 
(except BPN), all of which cause specific actions to occur. An UltraSPARC 
Architecture implementation may implement a prefetch variant by any technique, as 
long as the intent of the variant is achieved (impl. dep. #103-V9-Ms10(b)). 


The prefetch instruction is designed to treat common cases well. The variants are 
intended to provide scalability for future improvements in both hardware and 
compilers. If a variant is implemented, it should have the effects described below. In 
case some of the variants listed below are implemented and some are not, a 
recommended overloading of the unimplemented variants is provided in the SPARC 
V9 specification. An implementation must treat any unimplemented prefetch fcn 
values as NOPs (impl. dep. #103-V9-Ms10). 


7.73.3.1 Prefetch for Several Reads (fcn = 0, 20(1449)) 


The intent of these variants is to cause movement of data into the cache nearest the 
virtual processor. 


There are Weak and Strong versions of this prefetch variant; fcn = 0 is Weak and 
fcn = 20 is Strong. The choice of Weak or Strong variant controls the degree of effort 
that the virtual processor may expend to obtain the data. 


Programming | The intended use of this variant is for streaming relatively small 
Note | amounts of data into the primary data cache of the virtual 
processor. 


7.73.3.2 Prefetch for One Read (fcn = 1, 21(15,6)) 


The data to be read from the given address are expected to be read once and not 
reused (read or written) soon after that. Use of this PREFETCH variant indicates 
that, if possible, the data cache should be minimally disturbed by the data read from 
the given address. 


There are Weak and Strong versions of this prefetch variant; fcn = 1 is Weak and 
fcn = 21 is Strong. The choice of Weak or Strong variant controls the degree of effort 
that the virtual processor may expend to obtain the data. 


Programming | The intended use of this variant is in streaming medium amounts 
Note | of data into the virtual processor without disturbing the data in 
the primary data cache memory. 


7.73.3.3 Prefetch for Several Writes (and Possibly Reads) 
(fcn = 2, 22(1646)) 


The intent of this variant is to cause movement of data in preparation for multiple 
writes. 
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7.73.4 


PREFETCH 


There are Weak and Strong versions of this prefetch variant; fen = 2 is Weak and 
fcn = 22 is Strong. The choice of Weak or Strong variant controls the degree of effort 
that the virtual processor may expend to obtain the data. 


Programming | An example use of this variant is to initialize a cache line, in 
Note | preparation for a partial write. 


Implementation | On a multiprocessor system, this variant indicates that exclusive 

Note | ownership of the addressed data is needed. Therefore, it may 
have the additional effect of obtaining exclusive ownership of the 
addressed cache line. 





7.73.34 Prefetch for One Write (fcn = 3, 23(17,6)) 


The intent of this variant is to initiate movement of data in preparation for a single 
write. This variant indicates that, if possible, the data cache should be minimally 
disturbed by the data written to this address, because those data are not expected to 
be reused (read or written) soon after they have been written once. 


There are Weak and Strong versions of this prefetch variant; fcn = 3 is Weak and 
fcn = 23 is Strong. The choice of Weak or Strong variant controls the degree of effort 
that the virtual processor may expend to obtain the data. 


7.73.3.5 Prefetch Page (fcn = 4) 


In a virtual memory system, the intended action of this variant is for hardware (or 
privileged or hyperprivileged software) to initiate asynchronous mapping of the 
referenced virtual address (assuming that it is legal to do so). 


Programming 
Note 


Prefetch Page is used is to avoid a later page fault for the given 
address, or at least to shorten the latency of a page fault. 





In a non-virtual-memory system or if the addressed page is already mapped, this 
variant has no effect. 


Implementation 
Note 


The mapping required by Prefetch Page may be performed by 
privileged software, hyperprivileged software, or hardware. 





Implementation-Dependent Prefetch Variants 
(fen = 16, 18, 19, and 24-31) 


IMPL. DEP. #103-V9-Ms10(c): Whether and how PREFETCH fcns 16, 18, 19 and 24- 
31 are implemented are implementation dependent. If a variant is not implemented, 
it must execute as a NOP. 
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PREFETCH 
7.73.5 Additional Notes 


Programming | Prefetch instructions do have some “cost to execute”. As long as 

Note | the cost of executing a prefetch instruction is well less than the 
cost of a cache miss, use of prefetching provides a net gain in 
performance. 


It does not appear that prefetching causes a significant number of 
useless fetches from memory, though it may increase the rate of 
useful fetches (and hence the bandwidth), because it more 
efficiently overlaps computing with fetching. 


Programming | A compiler that generates PREFETCH instructions should 

Note | generate each of the variants where its use is most appropriate. 
That will help portable software be reasonably efficient across a 
range of hardware configurations. 





Implementation | Any effects of a data prefetch operation in privileged code should 
Note | be reasonable (for example, no page prefetching is allowed within 
code that handles page faults). The benefits of prefetching should 

be available to most privileged code. 


Implementation | A prefetch from a nonprefetchable location has no effect. It is up 
Note | to memory management hardware to determine how locations 
are identified as not prefetchable. 


Exceptions illegal_instruction 
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RDasr 


7.74 Read Ancillary State Register 


Instruction rs1 
RDYP 0 
— 1 
RDCCR 2 
RDASI 3 
RDTICKP»e 4 
RDPC 5 
RDFPRS 6 
— 7-14 
See text 15 
RDPCRP 16 
RDPICPric 17 
= 18 
RDGSR 19 
— 20-21 
RDSOFTINTP 22 


RDTICK_CMPR? 23 
RDSTICK? rt 24 
RDSTICK_CMPRP 25 


— 26-27 
— 28 


— 29-31 


Operation 

Read Y register (deprecated) 

Reserved 

Read Condition Codes register (CCR) 
Read ASI register 

Read TICK register 

Read Program Counter (PC) 


Read Floating-Point Registers Status (FPRS) 
register 


Reserved 
MEMBAR or Reserved; see text 
Read Performance Control registers (PCR) 


Read Performance Instrumentation Counters 
register (PIC) 


Reserved (impl. dep. #8-V8-Cs20, 9-V8-Cs20) 
Read General Status register (GSR) 
Reserved (impl. dep. #8-V8-Cs20, #9-V8-Cs20) 


Assembly Language Syntax 


rd Sy, regu 


rd $ccr, rer 
rd $asi, reg; 
rd Stick, reg; 
rd $pc, regrg 

rd $fprs, reg; 


rd $pcr, rer 


rd $pic, Tegra 


rd $gsr, Tegra 


Read per-virtual processor Soft Interrupt register rd $softint, reg 


(SOFTINT) 
Read Tick Compare register (TICK CMPR) 
Read System Tick Register (STICK) 


Read System Tick Compare register 
(STICK CMPR) 


Reserved (impl. dep. #8-V8-Cs20, 9-V8-Cs20) 


Implementation dependent 
(impl. dep. #8-V8-Cs20, 9-V8-Cs20) 


Implementation dependent 
(impl. dep. #8-V8-Cs20, 9-V8-Cs20) 


rd $tick cmpr, regrg 
rd $stickt, reg, 


rd %stick_cmprt, regra 


Class 
D2 


A1 
A1 
A1 
A2 
A1 


A1 
A1 


A1 


N2 


N2 
N2 
N2 





+ The original assembly language names for $stick and $stick cmpr were, respectively, $sys tick and $sys tick cmpr, which are 
now deprecated. Over time, assemblers will support the new $stick and $stick cmpr names for these registers (which are consistent 
with $tick and $tick cmpr). In the meantime, some existing assemblers may only recognize the original names. 





WI mS 
0 29 q 9 T8 7 0 
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Description The Read Ancillary State Register (RDasr) instructions copy the contents of the state 
register specified by rs1 into R[rd]. 


An RDasr instruction with rs1 = 0 is a (deprecated) RDY instruction (which should 
not be used in new software). 


The RDY instruction is deprecated. It is recommended that all instructions that 
reference the Y register be avoided. 


RDPC copies the contents of the PC register into R[rd]. If PSTATE.am = 0, the full 
64-bit address is copied into R[rd]. If PSTATE.am = 1, only a 32-bit address is saved; 
PC{31:0} is copied to R[rd]{31:0} and R[rd]{63:32} is set to 0. (closed impl. dep. #125- 
V9-Cs10) 


RDFPRS waits for all pending FPops and loads of floating-point registers to 
complete before reading the FPRS register. 


The following values of rs1 are reserved for future versions of the architecture: 1, 7- 
14, 18, 20-21, and 26-27. 


IMPL. DEP. #47-V8-Cs20: RDasr instructions with rd in the range 28-31 are 

available for implementation-dependent uses (impl. dep. #8-V8-Cs20). For an RDasr 

instruction with rs1 in the range 28-31, the following are implementation 

dependent: 

m the interpretation of bits 13:0 and 29:25 in the instruction 

m whether the instruction is nonprivileged or privileged (impl. dep. #9-V8-Cs20), 
and 

m whether an attempt to execute the instruction causes an illegal instruction 
exception. 


Implementation | See the section "Read/Write Ancillary State Registers (ASRs)" in 

Note | Extending the UltraSPARC Architecture, contained in the separate 
volume UltraSPARC Architecture Application Notes, for a 
discussion of extending the SPARC V9 instruction set using read / 
write ASR instructions. 


Note | Ancillary state registers may include (for example) timer, counter, 
diagnostic, self-test, and trap-control registers. 


SPARC V8 | The SPARC V8 RDPSR, RDWIM, and RDTBR instructions do not 
Compatibility | exist in the UltraSPARC Architecture, since the PSR, WIM, and 
Note | TBR registers do not exist. 





See Ancillary State Registers on page 67 for more detailed information regarding ASR 
registers. 
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Exceptions 


See Also 


RDasr 


Exceptions. An attempt to execute a RDasr instruction when any of the following 
conditions are true causes an illegal_instruction exception: 

m rsi = 15 and rd + 0 (reserved for future versions of the architecture) 

m rst = 1, 7-14, 18, 20-21, or 26-27 (reserved for future versions of the architecture) 
m instruction bits 13:0 are nonzero 


An attempt to execute a RDPCR (impl. dep. #250-U3-Cs10), RDSOFTINT, 
RDTICK_CMPR, RDSTICK, or RDSTICK_CMPR instruction in nonprivileged mode 
(PSTATE.priv = 0) causes a privileged_opcode exception (impl. dep. #250-U3-Cs10). 


If the FPU is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if no FPU is present, 
an attempt to execute a RDGSR instruction causes an fp disabled exception. 


In nonprivileged mode (PSTATE.priv = 0), the following cause a privileged action 
exception: 

m execution of RDTICK when nonprivileged access to TICK is disabled 

m execution of RDSTICK when nonprivileged access to STICK is disabled 

m execution of RDPIC when nonprivileged access to PIC is disabled (PCR.priv = 1) 


Implementation | RDasr shares an opcode withMEMBAR; it is distinguished by 
Note | rs1 = 15 or rd = 0 or (i= 0, and bit 12 = 0). 


illegal instruction 
privileged opcode 
fp disabled 
privileged action 


RDPR on page 290 
WRasr on page 358 
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RDPR 





7.75 Read Privileged Register 





Instruction op3 Operation rs1 Assembly Language Syntax 

RDPR? 101010 Read Privileged register 
TPC 0 rdpr SEPC, rer 
TNPC 1 rdpr $tnpc, Tegyg 
TSTATE 2 rdpr Ststate, regrd 
TT 3 rdpr tt, rer 
TICK 4 rdpr Stick, Tegra 
TBA 5 rdpr $tba, reg 
PSTATE 6 rdpr Spstate, regrd 
TL 7 rdpr Stl, rer 
PIL 8 rdpr Spil, rer 
CWP 9 rdpr Scwp, legig 
CANSAVE 10 rdpr $cansave, legig 
CANRESTORE 11 rdpr Scanrestore, reg 
CLEANWIN 12 rdpr S$cleanwin, Tegra 
OTHERWIN 13 rdpr Sotherwin, reg; 
WSTATE 14 rdpr $wstate, Tegrg 
Reserved 15 
GL 16 rdpr $gl, rer 
Reserved 17-31 


TN NEN sd i 


31 30 29 25 24 19 18 14 13 


Description The rs1 field in the instruction determines the privileged register that is read. There 
are MAXPTL copies of the TPC, TNPC, TT, and TSTATE registers. A read from one of 
these registers returns the value in the register indexed by the current value in the 
trap level register (TL). A read of TPC, TNPC, TT, or TSTATE when the trap level is 


zero (TL = 0) causes an illegal_instruction exception. 


Class 
N2 


An attempt to execute a RDPR instruction when any of the following conditions 


exist causes an illegal_instruction exception: 
m instruction bits 13:0 are nonzero 
m rs1= 15, or 17 €rs1 < 31 (reserved rs1 values) 


m Oxrs1 <3 (attempt to read TPC, TNPC,TSTATE, or TT register) while TL = 0 
(current trap level is zero) and the virtual processor is in privileged mode. 


Implementation | In nonprivileged mode, illegal_instruction exception due to 
Note | 0 < rs1 € 3 and TL = 0 does not occur; the privileged_opcode 


exception occurs instead. 


An attempt to execute a RDPR instruction in nonprivileged mode (PSTATE.priv = 0) 


causes a privileged_opcode exception. 
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Historical Note 


Exceptions illegal instruction 





RDPR 


On some early SPARC implementations, floating-point exceptions 
could cause deferred traps. To ensure that execution could be 
correctly resumed after handling a deferred trap, hardware 
provided a floating-point queue (FQ), from which the address of 
the trapping instruction could be obtained by the trap handler. 
The front of the FQ was accessed by executing a RDPR instruction 
with rs1 = 15. 


On UItraSPARC Architecture implementations, all floating-point 
traps are precise. When one occurs, the address of a trapping 
instruction can be found by the trap handler in the TPC[TL], so no 
floating-point queue (FQ) is needed or implemented (impl. dep. 
#25-V8) and RDPR with rs1 = 15 generates an illegal instruction 
exception. 


privileged opcode 


See Also RDasr on page 287 
WRPR on page 361 
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RESTORE 





7.76 RESTORE 


Instruction op3 Operation Assembly Language Syntax Class 


RESTORE 11 1101 Restore Caller’s Window restore Tegrsir reg or imm, Tegra A1 





mae 7 Yep 3597 es ee 
TETRI 


31 30 29 25 24 19 18 14 13 12 5 4 0 


Description The RESTORE instruction restores the register window saved by the last SAVE 
instruction executed by the current process. The in registers of the old window 
become the out registers of the new window. The in and local registers in the new 
window contain the previous values. 


Furthermore, if and only if a fill trap is not generated, RESTORE behaves like a 
normal ADD instruction, except that the source operands R[rs1] or R[rs2] are read 
from the old window (that is, the window addressed by the original CWP) and the 
sum is written into R[rd] of the new window (that is, the window addressed by the 
new CWP). 


Note | CWP arithmetic is performed modulo the number of implemented 
windows, N_REG_WINDOWS. 


Programming | Typically, if a RESTORE instruction traps, the fill trap handler 

Notes | returns to the trapped instruction to reexecute it. So, although the 
ADD operation is not performed the first time (when the 
instruction traps), it is performed the second time the instruction 
executes. The same applies to changing the CWP. 


There is a performance trade-off to consider between using SAVE/ 
RESTORE and saving and restoring selected registers explicitly. 





Description (Effect on Privileged State) 
If a RESTORE instruction does not trap, it decrements the CWP (mod 
N_REG_WINDOWS) to restore the register window that was in use prior to the last 
SAVE instruction executed by the current process. It also updates the state of the 
register windows by decrementing CANRESTORE and incrementing CANSAVE. 
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Exceptions 


See Also 


RESTORE 


If the register window to be restored has been spilled (CANRESTORE = 0), then a 
fill trap is generated. The trap vector for the fill trap is based on the values of 
OTHERWIN and WSTATE, as described in Trap Type for Spi Il/Fill Traps on page 442. 
The fill trap handler is invoked with CWP set to point to the window to be filled, 
that is, old CWP - 1. 


Programming | The vectoring of fill traps can be controlled by setting the value of 

Note | the OTHERWIN and WSTATE registers appropriately. For details, 
see the section “Splitting the Register Windows” in Software 
Considerations, contained in the separate volume UltraSPARC 
Architecture Application Notes. 


The fill handler normally will end with a RESTORED instruction 
followed by a RETRY instruction. 





An attempt to execute a RESTORE instruction when i = 0 and instruction bits 12:5 are 
nonzero causes an illegal instruction exception. 


illegal instruction 
fill n normal (n = 0-7) 
fill n other (n = 0-7) 


SAVE on page 300 
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RESTORED 





7.77 RESTORED 





Instruction Operation Assembly Language Syntax Class 

RESTORED? Window has been restored restored A1 
e-ooor[ cw [ —— —  — —  —] 
31 30 29 25 24 19 18 0 


Description | RESTORED adjusts the state of the register-windows control registers. 
RESTORED increments CANRESTORE. 
If CLEANWIN < (N_REG_WINDOWS-1), then RESTORED increments CLEANWIN. 


If OTHERWIN = 0, RESTORED decrements CANSAVE. If OTHERWIN £0, it 
decrements OTHERWIN. 


Programming | Trap handler software for register window fills use the 

Notes | RESTORED instruction to indicate that a window has been filled 
successfully. For details, see the section “Example Code for Spill 
Handler” in Software Considerations, contained in the separate 
volume UltraSPARC Architecture Application Notes. 


Normal privileged software would probably not execute a 
RESTORED instruction from trap level zero (TL = 0). However, it 
is not illegal to do so and doing so does not cause a trap. 


Executing a RESTORED instruction outside of a window fill trap 
handler is likely to create an inconsistent window state. Hardware 
will not signal an exception, however, since maintaining a 
consistent window state is the responsibility of privileged 
software. 





If CANSAVE = 0 or CANRESTORE 2 (N_REG_WINDOWS — 2) just prior to execution of 
a RESTORED instruction, the subsequent behavior of the processor is undefined. In 
neither of these cases can RESTORED generate a register window state that is both 

valid (see Register Window State Definition on page 85) and consistent with the state 

prior to the RESTORED. 


An attempt to execute a RESTORED instruction when instruction bits 18:0 are 
nonzero causes an illegal instruction exception. 


An attempt to execute a RESTORED instruction in nonprivileged mode (PSTATE.priv 
= 0) causes a privileged_opcode exception. 
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Exceptions 


See Also 


RESTORED 


illegal instruction 
privileged opcode 


ALLCLEAN on page 136 
INVALW on page 225 
NORMALW on page 274 
OTHERW on page 276 
SAVED on page 302 
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RETRY 





7.78 RETRY 


Instruction  op3 Operation Assembly Language Syntax Class 


RETRY? 111110 Return from Trap (retry trapped instruction) retry Al 


EE LL 7T] 


31 30 29 25 24 19 18 0 





Description The RETRY instruction restores the saved state from TSTATE[TL] (GL, CCR, ASI, 
PSTATE, and CWP), sets PC and NPC, and decrements TL. RETRY sets 
PC —TPC[TL] and NPC — TNPC[TL] (normally, the values of PC and NPC saved at 
the time of the original trap). 


Programming | The DONE and RETRY instructions are used to return from 
Note | privileged trap handlers. 


If the saved TPC[TL] and TNPC[TL] were not altered by trap handler software, 
RETRY causes execution to resume at the instruction that originally caused the trap 
("retrying" it). 


Execution of a RETRY instruction in the delay slot of a control-transfer instruction 
produces undefined results. 


If software writes invalid or inconsistent state to TSTATE before executing RETRY, 
virtual processor behavior during and after execution of the RETRY instruction is 
undefined. 


When PSTATE.am = 1, the more-significant 32 bits of the target instruction address 
are masked out (set to 0) before being sent to the memory system. 


IMPL. DEP. #417-S10: If (1) TSTATE[TL].pstate.am = 1 and (2) a RETRY instruction 
is executed (which sets PSTATE.am to '1' by restoring the value from 
TSTATE[TL].pstate.am to PSTATE.am), it is implementation dependent whether the 
RETRY instruction masks (zeroes) the more-significant 32 bits of the values it places 
into PC and NPC. 


Exceptions. An attempt to execute the RETRY instruction when the following 
condition is true causes an illegal instruction exception: 
m TL=0 and the virtual processor is in privileged mode (PSTATE.priv = 1) 
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RETRY 


An attempt to execute a RETRY instruction in nonprivileged mode (PSTATE.priv = 0) 
causes a privileged_opcode exception. 


Implementation | In nonprivileged mode, illegal instruction exception due to TL = 0 
Note | does not occur. The privileged opcode exception occurs instead, 
regardless of the current trap level (TL). 


Exceptions illegal instruction 


privileged opcode 


See Also DONE on page 154 
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RETURN 





7.79 RETURN 


Instruction op3 Operation Assembly Language Syntax Class 


RETURN 11 1001 Return return address A1 





eT - I 8. = i. 


31 30 29 25 24 19 18 14 13 12 5 4 0 


Description The RETURN instruction causes a delayed transfer of control to the target address 
and has the window semantics of a RESTORE instruction; that is, it restores the 
register window prior to the last SAVE instruction. The target address is 
^R[rs1] + R[rs2]" if i = 0, or "R[rs1] + sign ext (simm13)" if i = 1. Registers R[rs1] 
and R[rs2] come from the old window. 


Like other DCTIs, all effects of RETURN (including modification of CWP) are visible 
prior to execution of the delay slot instruction. 


Programming | To reexecute the trapped instruction when returning from a user trap 
Note | handler, use the RETURN instruction in the delay slot of a JMPL 
instruction, for example: 


jmp1%16, %g0 
return£17 


[Trapped PC supplied to user trap handler 
Trapped NPC supplied to user trap handler 





Programming | A routine that uses a register window may be structured either as: 
Note save %sp,-framesize, $sp 

ret 

restore 


Same as jmpl $i7 +8, %g0 
Something useful like "restore 
5$02,$12,$00" 


Or as: 
save $sp, —framesize, $sp 


return $i7 + 8 


nop ! Could do some useful work in the 
! caller's window, e.g., "or $01, $02,$00" 





An attempt to execute a RETURN instruction when i = 0 and instruction bits 12:5 are 
nonzero causes an illegal instruction exception. 


A RETURN instruction may cause a window fill exception as part of its RESTORE 
semantics. 


When PSTATE.am = 1, the more-significant 32 bits of the target instruction address 
are masked out (set to 0) before being sent to the memory system. 
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RETURN 


A RETURN instruction causes a mem_address_not_aligned exception if either of the 
two least-significant bits of the target address is nonzero. 


Exceptions illegal instruction 
fill n normal (n = 0-7) 
fill n other (n = 0-7) 
mem address not aligned 
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SAVE 





7.80 SAVE 


Instruction op3 Operation Assembly Language Syntax Class 


SAVE 11 1100 Save Caller’s Window save Tégrg1, leg Or imm, Te&rg A1 





TRE 


31 30 29 25 24 19 18 14 13 12 5.4 0 


Description The SAVE instruction provides the routine executing it with a new register window. 
The out registers from the old window become the in registers of the new window. 
The contents of the out and the local registers in the new window are zero or contain 
values from the executing process; that is, the process sees a clean window. 


Furthermore, if and only if a spill trap is not generated, SAVE behaves like a normal 
ADD instruction, except that the source operands R[rs1] or R[rs2] are read from the 
old window (that is, the window addressed by the original CWP) and the sum is 
written into R[rd] of the new window (that is, the window addressed by the new 
CWP). 


Note | CWP arithmetic is performed modulo the number of implemented 
windows, N_REG_WINDOWS. 


Programming | Typically, if a SAVE instruction traps, the spill trap handler returns 

Notes | to the trapped instruction to reexecute it. So, although the ADD 
operation is not performed the first time (when the instruction 
traps), it is performed the second time the instruction executes. 
The same applies to changing the CWP. 


The SAVE instruction can be used to atomically allocate a new 
window in the register file and a new software stack frame in 
memory. For details, see the section “Leaf-Procedure 
Optimization” in Software Considerations, contained in the 
separate volume UltraSPARC Architecture Application Notes. 


There is a performance trade-off to consider between using SAVE/ 
RESTORE and saving and restoring selected registers explicitly. 





Description (Effect on Privileged State) 
If a SAVE instruction does not trap, it increments the CWP (mod N_REG_WINDOWS) 
to provide a new register window and updates the state of the register windows by 
decrementing CANSAVE and incrementing CANRESTORE. 
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Exceptions 


See Also 


SAVE 


If the new register window is occupied (that is, CANSAVE = 0), a spill trap is 
generated. The trap vector for the spill trap is based on the value of OTHERWIN and 
WSTATE. The spill trap handler is invoked with the CWP set to point to the window 
to be spilled (that is, old CWP + 2). 


An attempt to execute a SAVE instruction when i = 0 and instruction bits 12:5 are 
nonzero causes an illegal instruction exception. 


If CANSAVE 0, the SAVE instruction checks whether the new window needs to be 


cleaned. It causes 


a clean_window trap if the number of unused clean windows is 


zero, that is, (CLEANWIN - CANRESTORE) = 0. The clean window trap handler is 
invoked with the CWP set to point to the window to be cleaned (that is, old 


CWP +1). 


Programming 
Note 





illegal instruction 


The vectoring of spill traps can be controlled by setting the value 
of the OTHERWIN and WSTATE registers appropriately. For 
details, see the section "Splitting the Register Windows" in 
Software Considerations, contained in the separate volume 
UltraSPARC Architecture Application Notes. 


The spill handler normally will end with a SAVED instruction 
followed by a RETRY instruction. 


spill n normal (n = 0-7) 
spill n other (n = 0-7) 


clean window 


RESTORE on page 292 
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SAVED 





7.81 SAVED 





Instruction Operation Assembly Language Syntax Class 

SAVED? Window has been saved saved A1 
eoo | Mo eyy 
31 30 29 25 24 19 18 0 


Description SAVED adjusts the state of the register-windows control registers. 


SAVED increments CANSAVE. If OTHERWIN = 0, SAVED decrements 
CANRESTORE. If OTHERWIN # 0, it decrements OTHERWIN. 


Programming | Trap handler software for register window spills uses the SAVED 

Notes | instruction to indicate that a window has been spilled 
successfully. For details, see the section “Example Code for Spill 
Handler” in Software Considerations, contained in the separate 
volume UltraSPARC Architecture Application Notes. 


Normal privileged software would probably not execute a SAVED 
instruction from trap level zero (TL = 0). However, it is not illegal 
to do so and doing so does not cause a trap. 


Executing a SAVED instruction outside of a window spill trap 
handler is likely to create an inconsistent window state. Hardware 
will not signal an exception, however, since maintaining a 
consistent window state is the responsibility of privileged 
software. 





If CANSAVE > (N REG WINDOWS — 2) or CANRESTORE = 0 just prior to execution of 
a SAVED instruction, the subsequent behavior of the processor is undefined. In 
neither of these cases can SAVED generate a register window state that is both valid 
(see Register Window State Definition on page 85) and consistent with the state prior to 
the SAVED. 


An attempt to execute a SAVED instruction when instruction bits 18:0 are nonzero 
causes an /llegal instruction exception. 


An attempt to execute a SAVED instruction in nonprivileged mode (PSTATE.priv = 0) 
causes a privileged opcode exception. 


Exceptions illegal instruction 
privileged opcode 
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See Also 


ALLCLEAN on page 136 
INVALW on page 225 
NORMALW on page 274 
OTHERW on page 276 
RESTORED on page 294 


SAVED 
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SDIV, SDIVcc (Deprecated) 





7.82 


Signed Divide (64-bit + 32-bit) 


The SDIV and SDIVcc instructions are deprecated and should not be used in new 
software. The SDIVX instruction should be used instead. 





Opcode op3 Operation Assembly Language Syntax Class 
SDIVP 001111 Signed Integer Divide sdiv Te rg1, l'eg. 0r. imr, leg rq D2 
SDIVccP 011111 Signed Integer Divide and modify cc's sdivcc  regyg,reg or imm, regrg D2 





m SH Ix 


Description 


25 24 19 18 14 13 12 5 4 0 


The signed divide instructions perform 64-bit by 32-bit division, producing a 32-bit 
result. If i= 0, they compute "(Y :: R[rs1](31:0]) + R[rs2](31:0]". Otherwise (that is, if 
i = 1), the divide instructions compute "(Y :: R[rs1](31:0]) + 

(sign ext(simm13)(31:0])". In either case, if overflow does not occur, the less 
significant 32 bits of the integer quotient are sign- or zero-extended to 64 bits and are 
written into R[rd]. 


The contents of the Y register are undefined after any 64-bit by 32-bit integer divide 
operation. 


Signed Divide Signed divide (SDIV, SDIVcc) assumes a signed integer doubleword dividend 


(Y :: lower 32 bits of R[rs1]) and a signed integer word divisor (lower 32 bits of 
R[rs2] or lower 32 bits of sign ext(simm13)) and computes a signed integer word 
quotient (R[rd]). 


Signed division rounds an inexact quotient toward zero. For example, -7 + 4 equals 
the rational quotient of —1.75, which rounds to -1 (not 2) when rounding toward 
Zero. 


The result of a signed divide can overflow the low-order 32 bits of the destination 
register R[rd] under certain conditions. When overflow occurs, the largest 
appropriate signed integer is returned as the quotient in R[rd]. The conditions under 
which overflow occurs and the value returned in R[rd] under those conditions are 
specified in TABLE 7-13. 
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Exceptions 


See Also 


SDIV, SDIVcc (Deprecated) 


TABLE 7-13 SDIV / SDIVcc Overflow Detection and Value Returned 








Condition Under Which Overflow Occurs Value Returned in R[rd] 
Rational quotient > 2?! 2°! —1 (0000 0000 7FFF FFFF16) 
Rational quotient < -23t - 1 -231 (FFFF FFFF 8000 000016) 


When no overflow occurs, the 32-bit result is sign-extended to 64 bits and written 
into register R[rd]. 


SDIV does not affect the condition code bits. SDIVcc writes the integer condition 
code bits as shown in the following table. Note that negative (N) and zero (Z) are set 
according to the value of R[rd] after it has been set to reflect overflow, if any. 





Bit Effect on bit of SDIVcc instruction 

icc.n Set to 1 if R[rd]{31} = 1; otherwise, set to 0 

icc.z Set to 1 if R[rd]{31:0} = 0; otherwise, set to 0 

icc.v Set to 1 if overflow (per TABLE 7-12); otherwise set to 0 
icc.c Set to 0 

Xcc.n Set to 1 if R[rd]{63} = 1; otherwise, set to 0 

XCC.Z Set to 1 if R[rd]{63:0} = 0; otherwise, set to 0 

XCC.V Set to 0 

XCC.C Set to 0 





An attempt to execute an SDIV or SDIVcc instruction when i = 0 and instruction bits 
12:5 are nonzero causes an illegal instruction exception. 


illegal instruction 
division by zero 


MULScc on page 270 
RDY on page 287 
UDIV[cc] on page 354 
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SETHI 





7.83 SETHI 


Instruction  op2 Operation Assembly Language Syntax Class 


SETHI 100 Set High 22 Bits of Low Word ^ sethi const22, mg M 
sethi %hi (value), reg, 





(ots poe] 


31 30 29 25 24 22 21 0 


Description SETHI zeroes the least significant 10 bits and the most significant 32 bits of R[rd] and 
replaces bits 31 through 10 of R[rd] with the value from its imm22 field. 


SETHI does not affect the condition codes. 


Some SETHI instructions with rd = 0 have special uses: 
m rd = 0 and imm22 = 0: defined to be a NOP instruction (described in No Operation) 


m rd = 0 and imm22 + 0 may be used to trigger hardware performance counters in 
some UltraSPARC Architecture implementations (for details, see implementation- 
specific documentation). 


Programming | The most common form of 64-bit constant generation is creating 
Note | stack offsets whose magnitude is less than 2°. The code below can 

be used to create the constant 0000 0000 ABCD 123446: 

sethi Shi (Oxabcd1234) ,%00 

or $00, 0x234, $00 
The following code shows how to create a negative constant. Note: 
The immediate field of the xor instruction is sign extended and can 
be used to place 1's in all of the upper 32 bits. For example, to set the 
negative constant FFFF FFFF ABCD 123446: 


sethi $hi(0x5432edcb),$00! note 0x5432EDCB, not OxABCD1234 
xor $00, Oxle34, $00! part of imm. overlaps upper bits 


Exceptions None 


306 UltraSPARC Architecture 2005 + Draft DO.9.2, 19 Jun 2008 


SHUTDOWN (Deprecated) 





784 SHUTDOWN 


The SHUTDOWN instruction is deprecated and should not be used in new 
software. 


Instruction opf Operation Assembly Language Syntax Class 





SHUTDOWN?” 0 1000 0000 Enter low-power mode shutdown D3 





per. s ne po ep o ] 


31 30 29 25 24 19 18 14 13 5 4 0 


Description | SHUTDOWN is a deprecated, privileged instruction that was used in early 
UltraSPARC implementations to bring the virtual processor or its containing system 
into a low-power state in an orderly manner. It had no effect on software-visible 
virtual processor state. 


On an UltraSPARC Architecture implementation operating in privileged mode, 
SHUTDOWN behaves like a NOP (impl. dep. #206-U3-Cs10). 


In an UltraSPARC Architecture 2005 implementation, this instruction is not 
implemented in hardware, causes an illegal instruction exception, and its effect is 
emulated in software. 


Exceptions illegal instruction (instruction not implemented in hardware) 
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SIAM 





7.85 Set Interval Arithmetic Mode 


Instruction opf Operation Assembly Language Syntax Class 


SIAM 0 1000 0001 Set the interval arithmetic mode fields in the GSR siam siam_mode B1 





Heo. eae Sa ae Tone, 


31 30 29 25 24 19 18 14 13 5 4 3 2 0 


Description The SIAM instruction sets the GSR.im and GSR.irnd fields as follows: 
GSR.im + mode{2} 
GSR.irnd — mode{1:0} 
Note | When GSR.im is set to 1, all subsequent floating-point 
instructions requiring round mode settings derive rounding- 


mode information from the General Status Register (GSR.irnd) 
instead of the Floating-Point State Register (FSR.rd). 


Note | When GSR.im = 1, the processor operates in standard floating- 
point mode regardless of the setting of FSR.ns. 





An attempt to execute a SIAM instruction when instruction bits 29:25, 18:14, or 4:3 
are nonzero causes an /llegal instruction exception. 


If the FPU is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if no FPU is present, an 
attempt to execute a SIAM instruction causes an fp disabled exception. 


Exceptions illegal_instruction 
fo disabled 


308 UltraSPARC Architecture 2005 + Draft DO.9.2, 19 Jun 2008 


SLL/SRL/SRA 





7.86 


Shift 














Instruction op3 x Operation Assembly Language Syntax Class 
SLL 10 0101 0 Shift Left Logical — 32 bits sll reg, Teg or shcnt, regra A1 
SRL 10 0110 0 Shift Right Logical — 32 bits srl reSrsir Teg or shcnt, regra A1 
SRA 100111 0 Shift Right Arithmetic- 32 bits sra Tegyg1, leg or shcnt, reg;g A1 
SLLX 10 0101 1 Shift Left Logical — 64 bits Sllx regn, reg or shcnt, regrg A1 
SRLX 10 0110 1 Shift Right Logical — 64 bits srlx regn, reg or shcnt, reg, A1 
SRAX 10 0111 1 Shift Right Arithmetic — 64 bits srax  feSrsir Teg Or shcnt, regrg A1 

10 rd op3 rst i=0| x — rs2 

10 rd op3 rs] i=1x=0 — shcnt32 

10 rd op3 rs] i=1x=1 — shcnt64 

31 30 29 25 24 19 18 14 13 12 6 5 4 0 

Description These instructions perform logical or arithmetic shift operations. 


When i = 0 and x = 0, the shift count is the least significant five bits of R[rs2]. 
When i = 0 and x = 1, the shift count is the least significant six bits of R[rs2]. When 
i = 1 and x = 0, the shift count is the immediate value specified in bits 0 through 4 of 
the instruction. 

When i = 1 and x = 1, the shift count is the immediate value specified in bits 0 
through 5 of the instruction. 


TABLE 7-14 shows the shift count encodings for all values of i and x. 


TABLE 7-14 Shift Count Encodings 





i x Shift Count 
bits 4-0 of R[rs2] 
bits 5-0 of R[rs2] 


0 
0 
1 bits 4-0 of instruction 
1 


0 
1 
0 
i 


bits 5-0 of instruction 





SLL and SLLX shift all 64 bits of the value in R[rs1] left by the number of bits 
specified by the shift count, replacing the vacated positions with zeroes, and write 
the shifted result to R[rd]. 
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SLL/SRL/SRA 


SRL shifts the low 32 bits of the value in R[rs1] right by the number of bits specified 
by the shift count. Zeroes are shifted into bit 31. The upper 32 bits are set to zero, 
and the result is written to R[rd]. 


SRLX shifts all 64 bits of the value in R[rs1] right by the number of bits specified by 
the shift count. Zeroes are shifted into the vacated high-order bit positions, and the 
shifted result is written to R[rd]. 


SRA shifts the low 32 bits of the value in R[rs1] right by the number of bits specified 
by the shift count and replaces the vacated positions with bit 31 of R[rs1]. The high- 
order 32 bits of the result are all set with bit 31 of R[rs1], and the result is written to 
R[rd]. 


SRAX shifts all 64 bits of the value in R[rs1] right by the number of bits specified by 
the shift count and replaces the vacated positions with bit 63 of R[rs1]. The shifted 
result is written to R[rd]. 


No shift occurs when the shift count is 0, but the high-order bits are affected by the 
32-bit shifts as noted above. 


These instructions do not modify the condition codes. 


Programming | “Arithmetic left shift by 1 (and calculate overflow)” can be 
Notes | effected with the ADDcc instruction. 


The instruction "sra reg,s1, 0,reg,q' can be used to convert a 32- 
bit value to 64 bits, with sign extension into the upper word. “srl 
Tegrg1, 0, regra” can be used to clear the upper 32 bits of R[rd]. 


An attempt to execute a SLL, SRL, or SRA instruction when instruction bits 11:5 are 
nonzero causes an illegal instruction exception. 


An attempt to execute a SLLX, SRLX, or SRAX instruction when either of the 
following conditions exist causes an illegal instruction exception: 


m i=0or X=0 and instruction bits 11:5 are nonzero 
m X=1 and instruction bits 11:6 are nonzero 


illegal instruction 
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SMUL, SMULcc (Deprecated) 





7.87 Signed Multiply (32-bit) 


The SMUL and SMULcc instructions are deprecated and should not be used in 
new software. The MULX instruction should be used instead. 


Opcode op3 Operation Assembly Language Syntax Class 





SMULP 001011 Signed Integer Multiply smul Tegrg1, leg OT imm, Tegyg D2 
SMULccP 011011 Signed Integer Multiply and modify cc's smulcc Fegysir leg or imm, regyg D2 





mI [I 8 H-—1I— 


31 30 29 25 24 19 18 14 13 12 5 4 0 


Description The signed multiply instructions perform 32-bit by 32-bit multiplications, producing 
64-bit results. They compute “R[rs1]{31:0} x R[rs2]{31:0}” if i = 0, or “R[rs1]{31:0} x 
sign ext (simm13)(31:0]" if i= 1. They write the 32 most significant bits of the 
product into the Y register and all 64 bits of the product into R[rd]. 


Signed multiply instructions (SMUL, SMULcc) operate on signed integer word 
operands and compute a signed integer doubleword product. 


SMUL does not affect the condition code bits. SMULcc writes the integer condition 
code bits, icc and xcc, as shown below. 


Bit Effect on bit by execution of SMULcc 

icc.n Set to 1 if product{31} = 1; otherwise, set to 0 
icc.z Set to 1 if product{31:0}= 0; otherwise, set to 0 
icc.v Set to 0 

icc.c Set to 0 

Xcc.n Set to 1 if product{63} = 1; otherwise, set to 0 
XCC.Z Set to 1 if product{63:0} = 0; otherwise, set to 0 
XCC.V Set to 0 

XCC.C Set to 0 





Note | 32-bit negative (icc.n) and zero (icc.z) condition codes are set 
according to the less significant word of the product, not 
according to the full 64-bit result. 
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Programming | 32-bit overflow after SMUL or SMULcc is indicated by 


Notes | Y # (R[rd] >> 31), where “>>” indicates 32-bit arithmetic right- 
shift. 


An attempt to execute a SMUL or SMULcc instruction when i = 0 and instruction bits 
12:5 are nonzero causes an illegal instruction exception. 


Exceptions illegal instruction 


See Also UMUL[cc] on page 356 
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STB / STH / STW / STX 





7.88 


Store Integer 


Instruction op3 Operation Assembly Language Syntax Class 
STB 00 0101 Store Byte stb? reg, [address] A1 
STH 00 0110 Store Halfword sth? reg, [address] Al 
STW 00 0100 Store Word stw? reg [address] A1 
STX 00 1110 Store Extended Word stx reg [address] A1 

t synonyms: stub, stsb i synonyms: stuh, stsh ? synonyms: st, stuw, stsw 


aoee A A ee T 


31 30 29 


Description 


Exceptions 


See Also 


25 24 19 18 14 13 12 5 4 0 


The store integer instructions (except store doubleword) copy the whole extended 
(64-bit) integer, the less significant word, the least significant halfword, or the least 
significant byte of R[rd] into memory. 


These instructions access memory using the implicit ASI (see page 104). The effective 
address for these instructions is “R[rs1] + R[rs2]" if i= 0, or 
^R[rs1] + sign ext (simm13)" if i = 1. 


A successful store (notably, STX) integer instruction operates atomically. 


An attempt to execute a store integer instruction when i = 0 and instruction bits 12:5 
are nonzero causes an /llegal instruction exception. 


STH causes a mem address not aligned exception if the effective address is not 
halfword-aligned. STW causes a mem address not aligned exception if the effective 
address is not word-aligned. STX causes a mem address not aligned exception if 
the effective address is not doubleword-aligned. 


illegal instruction 
mem address not aligned 
VA watchpoint 


STTW on page 334 
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1:09 


Instruction 


STBAP^s 
STHAP^s 
STWAP^s 


STXAP^s 


Store Integer into Alternate Space 











Operation Assembly Language Syntax Class 
010101 Store Byte into Alternate Space stbat reg, [regaddr] imm asi A1 
stba feg, [reg plus imm] $asi 
010110 Store Halfword into Alternate Space stha?  regyg [regaddr] imm asi A1 
stha feg, [reg plus imm] $asi 
010100 Store Word into Alternate Space stwa? reg,g [regaddr] imm asi A1 
stwa fegra, [reg plus imm] $asi 
01 1110 Store Extended Word into Alternate  stxa reg, [regaddr] imm asi A1 
Space stxa reg lreg plus imm] Sasi 
i synonyms: stuba, stsba t synonyms: stuha, stsha ? synonyms: sta, stuwa, stswa 


A I I5 FL wu I5 
ond 


31 30 29 


Description 


25 24 19 18 14 13 12 5 4 0 


The store integer into alternate space instructions copy the whole extended (64-bit) 
integer, the less significant word, the least significant halfword, or the least 
significant byte of R[rd] into memory. 


Store integer to alternate space instructions contain the address space identifier (ASI) 
to be used for the store in the imm asi field if i = 0, or in the ASI register if i = 1. The 
access is privileged if bit 7 of the ASI is 0; otherwise, it is not privileged. The 
effective address for these instructions is “R[rs1] + R[rs2]" if i 2 0, or 

^R[rs1]-* sign. ext (simm13)" if i = 1. 


A successful store (notably, STXA) instruction operates atomically. 


In nonprivileged mode (PSTATE.priv = 0), if bit 7 of the ASI is 0, these instructions 
cause a privileged action exception. In privileged mode (PSTATE.priv = 1), if the ASI 
is in the range 3046 to 7F16, these instructions cause a privileged action exception. 


STHA causes a mem adaress not aligned exception if the effective address is not 
halfword-aligned. STWA causes a mem address not aligned exception if the 
effective address is not word-aligned. STXA causes a mem address not aligned 
exception if the effective address is not doubleword-aligned. 
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STBA, STHA, and STWA can be used with any of the following ASls, subject to the 
privilege mode rules described for the privileged action exception above. Use of any 
other ASI with these instructions causes a data access exception exception. 





ASI 
ASI 
ASI 
ASI. 
ASI 


AS 
AS 





. NUCLEUS 


.AS IF USER PRIMARY 
.AS IF USER SECONDARY 


REAL 
REAL IO 





[ PRIMARY 
[ SECONDARY 


ASls valid for STBA, STHA, and STWA 
ASI, NUCLEUS, LITTLE 


ASI AS IF USER PRIMARY LITTLE 
ASI AS IF USER SECONDARY LITTLE 
ASI REAL LITTLE 





ASI REAL IO LITTLE 





ASI PRIMARY LITTLE 
ASI SECONDARY LITTLE 


STXA can be used with any ASI (including, but not limited to, the above list), unless 
it either (a) violates the privilege mode rules described for the privileged action 
exception above or (b) is used with any of the following ASIs, which causes a 

data access exception exception. 





ASI B 


ASIs invalid for STXA 
LOCK AS IF USER PRIMARY ASI 





ASI B 





LOCK AS IF USER SECONDARY ASI 


244g (aliased to 27,6, ASI, TWINX N) 2C 




















ASI BLOCK AS IF USER PRIMARY ASI 
ASI BLOCK AS IF USER SECONDARY ASI 
244g (deprecated ASI QUAD. LDD) 2C 
ASI PST8, PRIMARY 

ASI PST8, SECONDARY 

ASI PRIMARY NO FAULT 

ASI SECONDARY NO, FAULT 

ASI PST16 PRIMARY 

ASI PST16 SECONDARY 

ASI PST32 PRIMARY 

ASI PST32 SECONDARY 

ASI FL8. PRIMARY ASI 
ASI FL8. SECONDARY ASI 
ASI FL16, PRIMARY ASI 
ASI FL16, SECONDARY ASI 
ASI BLOCK COMMIT, PRIMARY ASI 
ASI BLOCK PRIMARY ASI 
ASI BLOCK SECONDARY ASI 











ASI 
ASI 
ASI 
ASI. 
ASI 
ASI 
ASI. 
ASI 





(cause data access exception exception) 
BLOCK AS IF USER PRIMARY LITTLE 





BLOCK AS IF USER SECONDARY LITTLE 





D 


(aliased to 2F,4,, ASI TWINX NL) 
BLOCK AS IF USER PRIMARY LITTLE 








BLOCK AS IF USER SECONDARY LITTLE 





o 


PST32 PRIMARY LITTLE 
PST32 SECONDARY LITTLE 


(deprecated ASI QUAD, LDD. L) 





PST8 PRIMARY LITTLE 

PST8 SECONDARY LITTLE 
PRIMARY NO FAULT LITTLE 
SECONDARY NO FAULT LITTLE 
PST16 PRIMARY LITTLE 
PST16 SECONDARY LITTLE 





L8 PRIMARY LITTLE 





FL8 SECONDARY LITTLE 
FL16 PRIMARY LITTLE 





L16 SECONDARY LITTLE 
LOCK COMMIT SECONDARY 











LOCK PRIMARY LITTLE 
LOCK SECONDARY LITTLE 








V8 Compatibility | The SPARC V8 STA instruction was renamed STWA in the 
Note | SPARC V9 architecture. 
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Exceptions mem address not aligned (all except STBA) 
privileged action 
VA watchpoint 


See Also LDA on page 229 
STTWA on page 336 
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STBLOCKF 





7.90 Block Store 


The STBLOCKF instruction is intended to be a processor-specific instruction, 
which may or may not be implemented in future UltraSPARC Architecture 
implementations. Therefore, it should only be used in platform-specific 


dynamically-linked libraries or in software created by a runtime code generator 
that is aware of the specific virtual processor implementation on which it is 





























executing. 
ASI 

Instruction Value Operation Assembly Language Syntax Class 

STBLOCKF 1646 64-byte block store to primary address stda fregar [regaddr] #ASI_BLK_AIUP A2 
space, user privilege stda /freg,g, [reg plus imm) %asi 

STBLOCKF 1736 64-byte block store to secondary address — stda freg,g, [regaddr] #ASI_BLK_AIUS A2 
space, user privilege stda /freg,g, [reg plus imm) %asi 

STBLOCKF 1E;g 64-byte block store to primary address stda freg,g, [regaddr] #ASI_BLK_AIUPL A2 
space, little-endian, user privilege stda freg,g, [reg plus imm) %asi 

STBLOCKF 1F36 64-byte block store to secondary address — stda freg;g, [regaddr] $&ASI BLK AIUSL A2 
space, little-endian, user privilege stda freg,g, [reg plus imm) %asi 

STBLOCKF F046 64-byte block store to primary address stda freg,g, [regaddr] &ASI BLK P A2 
space stda freg,g, [reg plus imm] %asi 

STBLOCKF Fly6 64-byte block store to secondary address — stda freg,g, [regaddr] $ASI BLK S A2 
space stda freg,g, [reg plus imm] %asi 

STBLOCKF F846 64-byte block store to primary address stda freg,g, [regaddr] #ASI_BLK_PL A2 
space, little-endian stda frega, [reg plus imm) %asi 

STBLOCKF F946 64-byte block store to secondary address stda freg,g, [regaddr] #ASI_BLK_SL A2 
space, little-endian stda freg;g, [reg plus imm) %asi 








nom CNN NL CNN RN 
nom 


31 30 29 25 24 19 18 14 13 5 4 0 


Description A block store instruction references one of several special block-transfer ASIs. Block- 
transfer ASIs allow block stores to be performed accessing the same address space as 
normal stores. Little-endian ASIs (those with an ‘L’ suffix) access data in little-endian 
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format; otherwise, the access is assumed to be big-endian. Byte swapping is 
performed separately for each of the eight double-precision registers accessed by the 
instruction. 
Programming | The block store instruction, STBLOCKF, and its companion, 
Note | LDBLOCKE, were originally defined to provide a fast 
mechanism for block-copy operations. 


STBLOCKF stores data from the eight double-precision floating-point registers 
specified by rd to a 64-byte-aligned memory area. The lowest-addressed eight bytes 
in memory are stored from the lowest-numbered double-precision rd. 


While a STBLOCKF operation is in progress, any of the following values may be 
observed in a destination doubleword memory locations: (1) the old data value, (2) 
zero, or (3) the new data value. When the operation is complete, only the new data 
values will be seen. 
Compatibility | Software written for older UltraSPARC implementations 
Note | that reads data being written by STBLOCKF instructions 
may or may not allow for case (2) above. Such software 
should be checked to verify that either it always waits 
for STBLOCKF to complete before reading the values 
written, or that it will operate correctly if an intermediate 
value of zero (not the "old" or "new" data values) is 
observed while the STBLOCKF operation is in progress. 


A Block Store only guarantees atomicity for each 64-bit (8-byte) portion of the 64 
bytes that it stores. 


Software should assume the following (where “load operation" includes load, load- 
store, and LDBLOCKF instructions and "store operation" includes store, load-store, 
and STBLOCKTF instructions): 


m À STBLOCKF does not follow memory ordering with respect to earlier or later 
load operations. If there is overlap between the addresses of destination memory 
locations of a STBLOCKF and the source address of a later load operation, the 
load operation may receive incorrect data. Therefore, if ordering with respect to 
later load operations is important, a MEMBAR #StoreLoad instruction must be 
executed between the STBLOCKF and subsequent load operations. 


m A STBLOCKF does not follow memory ordering with respect to earlier or later 
store operations. Those instructions' data may commit to memory in a different 
order from the one in which those instructions were issued. Therefore, if ordering 
with respect to later store operations is important, a MEMBAR #StoreStore 
instruction must be executed between the STBLOCKF and subsequent store 
operations. 


m STBLOCKFs do not follow register dependency interlocks, as do ordinary stores. 
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Programming | STBLOCKF is intended to be a processor-specific instruction (see 
Note | the warning at the top of page 317). If STBLOCKF must be used 
in software intended to be portable across current and previous 
processor implementations, then it must be coded to work in the 
face of any implementation variation that is permitted by 
implementation dependency 5411-510, described below. 


IMPL. DEP. #411-S10: The following aspects of the behavior of the block store 

(STBLOCKF) instruction are implementation dependent: 

m The memory ordering model that STBLOCKF follows (other than as constrained 
by the rules outlined above). 

m Whether VA watchpoint exceptions are recognized on accesses to all 64 bytes of 
the STBLOCKF (the recommended behavior), or only on accesses to the first eight 
bytes. 

m Whether STBLOCKFs to non-cacheable (TTE.cp = 0) pages execute in strict 
program order or not. If not, a STBLOCKF to a non-cacheable page causes an 
illegal instruction exception. 

m Whether STBLOCKF follows register dependency interlocks (as ordinary stores 
do). 

m Whether a STBLOCKT forces the data to be written to memory and invalidates 
copies in all caches present. 

m Any other restrictions on the behavior of STBLOCKF, as described in 
implementation-specific documentation. 


Exceptions. An illegal instruction exception occurs if the source floating-point 
registers are not aligned on an eight-register boundary. 


If the FPU is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if no FPU is present, an 
attempt to execute a STBLOCKF instruction causes an fp disabled exception. 


If the least significant 6 bits of the memory address are not all zero, a 
mem address not aligned exception occurs. 


In nonprivileged mode (PSTATE priv = 0), if bit 7 of the ASI is 0 (ASIs 1646, 1716, 
1E46, and 1F49), STBLOCKF causes a privileged action exception. 


An access caused by STBLOCKF may trigger a VA watchpoint exception (impl. dep. 
#411-S10). 


Implementation | STBLOCKF shares an opcode with the STDFA, STPARTIALF, 
Note | and STSHORTF instructions; it is distinguished by the ASI used. 


illegal instruction 

mem address not aligned 
privileged action 

VA watchpoint (impl. dep. 411-510) 
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See Also LDBLOCKF on page 232 
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STF / STDF / STQF 





7.91 


Store Floating-Point 


Instruction  op3 rd Operation Assembly Language Class 
STF 10 0100 0-31 Store Floating-Point register st fregrar [address] A1 
STDF 10 0111 t Store Double Floating-Point register std frega, [address | A1 
STOF 10 0110 $ Store Quad Floating-Point register — stq fregrar [address] C3 





* Encoded floating-point register value, as described on page 51. 


[a EE a a a ee 


spi 
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Description 


25 24 19 18 14 13 12 5 4 0 


The store single floating-point instruction (STF) copies the contents of the 32-bit 
floating-point register Fg [rd] into memory. 


The store double floating-point instruction (STDF) copies the contents of 64-bit 
floating-point register Fp[rd] into a word-aligned doubleword in memory. The unit 
of atomicity for STDF is 4 bytes (one word). 


The store quad floating-point instruction (STQF) copies the contents of 128-bit 
floating-point register FAlrd] into a word-aligned quadword in memory. The unit of 
atomicity for STOF is 4 bytes (one word). 


These instruction access memory using the implicit ASI (see page 104). The effective 
address for these instructions is "R[rs1] + R[rs2]" if i 2 0, or 
^R[rs1] + sign ext (simm13)" if i = 1. 


Exceptions. An attempt to execute a STF or STDF instruction when i = 0 and 
instruction bits 12:5 are nonzero causes an illegal instruction exception. 


If the floating-point unit is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if the 
FPU is not present, then an attempt to execute a STF or STDF instruction causes an 
fp disabled exception. 


STF causes a mem adaress not aligned exception if the effective memory address is 
not word-aligned. 


STDF requires only word alignment in memory. However, if the effective address is 
word-aligned but not doubleword-aligned, an attempt to execute an STDF 
instruction causes an STDF mem address not aligned exception. In this case, trap 
handler software must emulate the STDF instruction and return (impl. dep. #110-V9- 
Cs10(a)). 
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Exceptions 


See Also 


STF / STDF / STQF / STXFSR 


STOF requires only word alignment in memory. If the effective address is word- 
aligned but not quadword-aligned, an attempt to execute an STOF instruction causes 
an STQF mem adaress not aligned exception. In this case, trap handler software 
must emulate the STOF instruction and return (impl. dep. #112-V9-Cs10(a)). 





Programming | Some compilers issued sequences of single-precision stores for 
Note | SPARC V8 processor targets when the compiler could not 
determine whether doubleword or quadword operands were 
properly aligned. For SPARC V9, since emulation of misaligned 
stores is expected to be fast, compilers should issue sets of single- 
precision stores only when they can determine that double- or 
quadword operands are rot properly aligned. 


An attempt to execute an STOF instruction when rd{1} + 0 causes an 
fp exception other (FSR.ftt = invalid fp register) exception. 


Implementation | Since UltraSPARC Architecture 2005 processors do not implement 
Note | in hardware instructions (including STQF) that refer to quad- 
precision floating-point registers, the 
STQF mem address not aligned and fp exception other (with 
FSR ftt = invalid fp register) exceptions do not occur in 
hardware. However, their effects must be emulated by software 
when the instruction causes an illegal instruction exception and 
subsequent trap. 





illegal instruction 

fo disabled 

STDF mem address not aligned 

STQF mem address not aligned (not used in UltraSPARC Architecture 2005) 
mem address not aligned 

fp exception other (FSR.ftt = invalid fp register (STOF only)) 

VA watchpoint 


Load Floating-Point Register on page 236 

Block Store on page 317 

Store Floating-Point into Alternate Space on page 323 
Store Floating-Point State Register (Lower) on page 327 
Store Short Floating-Point on page 332 

Store Partial Floating-Point on page 329 

Store Floating-Point State Register on page 339 
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STFA / STDFA / STQFA 





1.92 


Store Floating-Pointinto Alternate Space 


Instruction op3 rd Operation Assembly Language Syntax Class 
STFA Past 110100 0-31 Store Floating-Point Register sta freg,g, [regaddr] imm asi A1 
to Alternate Space sta fregar [reg plus imm] $asi 


sTDFAPs 11011 * Store Double Floating-Point stda fregrar 


STQFAPAS 110110 + Store Quad Floating-Point stqa freg,gr 
8 8rd 


[ 

[regaddr] imm asi A1 
Register to Alternate Space  stda  freg,g, [reg plus imm] $asi 

[ 

[ 


regaddr] imm asi C3 


Register to Alternate Space  stqa  freg;g, [reg plus imm] Sasi 





* Encoded floating-point register value, as described on page 51. 


FS IE WI 


31 30 29 


Description 


25 24 19 18 14 13 12 5 4 0 


The store single floating-point into alternate space instruction (STFA) copies the 
contents of the 32-bit floating-point register Fg[rd] into memory. 


The store double floating-point into alternate space instruction (STDFA) copies the 
contents of 64-bit floating-point register Fp[rd] into a word-aligned doubleword in 
memory. The unit of atomicity for STDFA is 4 bytes (one word). 


The store quad floating-point into alternate space instruction (STOFA) copies the 
contents of 128-bit floating-point register Fo[rd] into a word-aligned quadword in 
memory. The unit of atomicity for STQFA is 4 bytes (one word). 


Store floating-point into alternate space instructions contain the address space 
identifier (ASI) to be used for the load in the imm_asi field if i = 0 or in the ASI 
register if i= 1. The access is privileged if bit 7 of the ASI is 0; otherwise, it is not 
privileged. The effective address for these instructions is "R[rs1] + R[rs2]" if i = 0, or 
“R[rs1] + sign ext (simm13)" if i = 1. 


Programming | Some compilers issued sequences of single-precision stores for 
Note | SPARC V8 processor targets when the compiler could not 
determine whether doubleword or quadword operands were 
properly aligned. For SPARC V9, since emulation of misaligned 
stores is expected to be fast, compilers should issue sets of single- 
precision stores only when they can determine that double- or 
quadword operands are not properly aligned. 


Exceptions. STFA causes a mem adaress not aligned exception if the effective 
memory address is not word-aligned. 
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STDFA requires only word alignment in memory. However, if the effective address is 
word-aligned but not doubleword-aligned, an attempt to execute an STDFA 
instruction causes an STDF_mem_address_not_aligned exception. In this case, trap 
handler software must emulate the STDFA instruction and return (impl. dep. #110- 
V9-Cs10(b)). 


STOFA requires only word alignment in memory. However, if the effective address is 
word-aligned but not quadword-aligned, an attempt to execute an STOFA 
instruction may cause an STQF_mem_address_not_aligned exception. In this case, 
the trap handler software must emulate the STQFA instruction and return (impl. 
dep. #112-V9-Cs10(b)). 





Implementation | STDFA shares an opcode with the STBLOCKF, STPARTIALF, 
Note | and STSHORTF instructions; it is distinguished by the ASI used. 


An attempt to execute an STOFA instruction when rd{1} # 0 causes an 
fp exception other (FSR.ftt = invalid. fp. register) exception. 


Implementation | Since UltraSPARC Architecture 2005 processors do not implement 
Note | in hardware instructions (including STOFA) that refer to quad- 
precision floating-point registers, the 
STQF mem address not aligned and fp exception other (with 
FSR.ftt = invalid. fp. register) exceptions do not occur in 
hardware. However, their effects must be emulated by software 
when the instruction causes an /llegal instruction exception and 
subsequent trap. 





In nonprivileged mode (PSTATE.priv = 0), if bit 7 of the ASI is 0, this instruction 
causes a privileged action exception. In privileged mode (PSTATE priv = 1), if the 
ASI is in the range 30,4 to 7F4e, this instruction causes a privileged action exception. 


STFA and STQFA can be used with any of the following ASIs, subject to the privilege 
mode rules described for the privileged action exception above. Use of any other ASI 
with these instructions causes a data access exception exception. 





ASIs valid for STFA and STOFA 














ASI NUCLEUS ASI NUCLEUS LITTLE 

ASI AS IF USER PRIMARY ASI AS IF USER PRIMARY LITTLE 
ASI AS IF USER SECONDARY ASI AS IF USER, SECONDARY LITTLE 
ASI REAL ASI REAL LITTLE 

ASI REAL IO ASI REAL IO LITTLE 

ASI PRIMARY ASI PRIMARY LITTLE 

ASI SECONDARY ASI SECONDARY LITTLE 
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Exceptions 


STFA / STDFA / STQFA 


STDFA can be used with any of the following ASls, subject to the privilege mode 
rules described for the privileged_action exception above. Use of any other ASI with 
the STDFA instruction causes a data_access_exception exception. 





ASI 
ASI 
ASI 
ASI 
ASI 


ASI 
ASI 


ASI 
ASI 
ASI 
ASI 
ASI 
ASI 


ASI 
ASI 
ASI 
ASI 


ASI 
ASI 
ASI 
ASI 
ASI 
ASI 





UCLEUS 


FAL 
EAL IO 


las) 


RIMARY 
ECONDARY 





[o 


BLOCK AS I 


ASIs valid for STDFA 


N 
.AS IF USER PRIMARY 
.AS IF USER SECONDARY 
_R 
_R 


F_USER_PRIMARY t 





BLOCK AS I 


F USER SECONDARY t ASI BLOCK AS IF 





. BLOCK PRIMARYt 


.BLOCK SECO 
BLOCK COMM 





DARY t 
IT PRIMARY t 





BLOCK COMM 


IT SECONDARY tł 





.FL8, PRIMARY f 
.FL8, SECONDARY f 


.FL16 PRIMA 
.FL16 SECON 


.PST8 PRIMA 
_PST8_SECON 


RY + 
DARY f 


RY * 
DARY * 


_PST16_PRIMARY * 


_PST16_SECO 


DARY * 


_PST32_PRIMARY * 


_PST32_SECO 





DARY * 


ASI_NUCLEUS_LITTLE 
ASI_AS_IF_USER_PRIMARY_LITTLE 
ASI_AS_IF_USER_SECONDARY_LITTLE 
ASI_REAL LITTLE 

ASI REAL IO LITTLE 








ASI PRIMARY LITTLE 
ASI SECONDARY LITTLE 





USER PRIMARY LITTLEt 
USER SECONDARY LITTLEt 
ASI BLOCK PRIMARY LITTLEt 

ASI BLOCK SECONDARY LITTLEt 


ASI BLOCK AS IF 

















ASI FL8 PRIMARY LITTLEfÍ 
ASI FL8 SECONDARY LITTLEfI 
ASI FL16 PRIMARY_LITILE f 
ASI FL16 SECONDARY LITTLEfI 





ASI PST8, PRIMARY, LITTLE * 
ASI PST8, SECONDARY LITTLE * 
ASI PST16 PRIMARY LITTLE * 

ASI PST16 SECONDARY LITTLE * 
ASI PST32 PRIMARY LITTLE * 
ASI PST32 SECONDARY LITTLE * 


























t If this ASI is used with the opcode for STDFA, the STBLOCKF instruction is 
executed instead of STFA. For behavior of STBLOCKF, see Block Store on page 317. 
i If this ASI is used with the opcode for STDFA, the STSHORTF instruction 
is executed instead of STDFA. For behavior of STSHORTF, see 
Store Short Floating-Point on page 332. 
* [f this ASI is used with the opcode for STDFA, the STPARTIALF instruction 
is executed instead of STDFA. For behavior of STPARTIALF, see 
Store Partial Floating-Point on page 329. 


illegal instruction 
fo disabled 

STDF mem address not aligned 
STQF mem address not aligned (STOFA only) (not used in UA-2005) 
mem address not aligned 
fp exception other (FSR.ftt = invalid fp register (STOFA only)) 
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STFA / STDFA / STQFA 


privileged_action 
VA_watchpoint 


See Also Load Floating-Point from Alternate Space on page 239 
Block Store on page 317 
Store Floating-Point on page 321 
Store Short Floating-Point on page 332 
Store Partial Floating-Point on page 329 
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STFSR (Deprecated) 





7.93 Store Floating-Point State Register 








(Lower) 
The STFSR instruction is deprecated and should not be used in new software. 
The STXFSR instruction should be used instead. 
Opcode op3 rd Operation Assembly Language Syntax Class 
STFSRP 100101 0 Store Floating-Point State Register (Lower) st $fsr, [address] D2 


10 0101 1-31 (see page 339) 


D A e e E 


31 30 29 25 24 19 18 14 13 12 5 4 0 


Description The Store Floating-point State Register (Lower) instruction (STFSR) waits for any 
currently executing FPop instructions to complete, and then it writes the less- 
significant 32 bits of FSR into memory. 


After writing the FSR to memory, STFSR zeroes FSRfit 


V9 Compatibility | FSR.ftt should not be zeroed until it is known that the store will 
Note | not cause a precise trap. 


STFSR accesses memory using the implicit ASI (see page 104). The effective address 
for this instruction is "R[rs1] + R[rs2]" if i = 0, or "R[rs1] + sign ext (simm13)" if 
| - 1. 


An attempt to execute a STFSR instruction when i = 0 and instruction bits 12:5 are 
nonzero causes an illegal instruction exception. 


If the floating-point unit is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if the 
FPU is not present, then an attempt to execute a STFSR instruction causes an 
fp disabled exception. 
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STFSR (Deprecated) 


STFSR causes a mem_address_not_aligned exception if the effective memory 
address is not word-aligned. 


V9 Compatibility | Although STFSR is deprecated, UltraSPARC Architecture 

Note | implementations continue to support it for compatibility with 
existing SPARC V8 software. The STFSR instruction is defined 
to store only the less-significant 32 bits of the FSR into memory, 
while STXFSR allows SPARC V9 software to store all 64 bits of 
the FSR. 


Implementation | STFSR shares an opcode with the STXFSR instruction (and 

Note | possibly with other implementation-dependent instructions); 
they are differentiated by the instruction rd field. An attempt to 
execute the op = 105, op3 = 10 0101, opcode with an invalid rd 
value causes an illegal instruction exception. 





Exceptions illegal instruction 
fo disabled 
mem address not aligned 
VA watchpoint 


See Also Store Floating-Point on page 321 
Store Floating-Point State Register on page 339 
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STPARTIALF 





7.94 


Store Partial Floating-Point 























ASI 

Instruction Value Operation Assembly Language Syntax + Class 

STPARTIALF C0 Eight 8-bit conditional stores to stda fregrar legrgo, [Tegrs1] #ASI_PST8_P C3 
primary address space 

STPARTIALF Cl Eight 8-bit conditional stores to stda fregrar legrgo, [Tegrs1] #ASI_PST8_S C3 
secondary address space 

STPARTIALF C8% Eight 8-bit conditional stores to stda fregrq, leSrs2r (1e8754] #ASI_PST8_PL C3 
primary address space, little-endian 

STPARTIALF C946 Eight 8-bit conditional stores to stda fregrar legrso, [Tegrs1] #ASI_PST8_SL C3 
secondary address space, little- 
endian 

STPARTIALF C2% Four 16-bit conditional stores to — stda  freg,g, regrso, |regys1] #ASI_PST16_P C3 
primary address space 

STPARTIALF C3416 Four 16-bit conditional stores to stda fregrar legrsg, [regrs1] *ASI PST16 S C3 
secondary address space 

STPARTIALF CA Four 16-bit conditional stores to stda fregrar Te8rsar [Tegrs1] #AST_PST16 PL C3 
primary address space, little-endian 

STPARTIALF CB: Four 16-bit conditional stores to stda  frég,g, Te8rsar [Tegrs1] #ASI_PST16_SL C3 
secondary address space, little- 
endian 

STPARTIALF C4;¢ Two 32-bit conditional stores to stda freSrqr légyso, [regyg1] #ASI_PST32_P C3 
primary address space 

STPARTIALF C536 Two 32-bit conditional stores to stda fregrar egre, [Tegrs1] #ASI_PST32_S C3 
secondary address space 

STPARTIALF CC} Two 32-bit conditional stores to stda fregrar legrgo, [Tegrs1] #ASI_PST32 PL C3 
primary address space, little-endian 

STPARTIALF CD Two 32-bit conditional stores to stda fregrar reSrso, [Tegrs1] #ASI_PST32 SL C3 


secondary address space, little- 
endian 


+ The original assembly language syntax for a Partial Store instruction (“stda fregyg, 
recated because of inconsistency with the rest of the SPARC assembly language. Over time, assemblers will support the new syntax 
for this instruction. In the meantime, some existing assemblers may only recognize the original syntax. 











[redrsi] regrs2, imm asi") has been dep- 


ES SE Tep mmas 


31 30 29 


Description 


25 24 19 18 


14 18 


5 4 


rs2 


The partial store instructions are selected by one of the partial store ASIs with the 


STDFA instruction. 
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STPARTIALF 


Two 32-bit, four 16-bit, or eight 8-bit values from the 64-bit floating-point register 
Fp[rd] are conditionally stored at the address specified by R[rs1], using the mask 
specified in R[rs2]. STPARTIALF has the effect of merging selected data from its 
source register, Fp[rd], into the existing data at the corresponding destination 
locations. 


The mask value in R[rs2] has the same format as the result specified by the pixel 
compare instructions (see SIMD Signed Compare on page 166). The most significant 
bit of the mask (not of the entire register) corresponds to the most significant part of 
Fp[rd]. The data is stored in little-endian form in memory if the ASI name has an “L” 
(or ^ LITTLE") suffix; otherwise, it is stored in big-endian format. 








R[rs2] 
8-bit partial store mask 
forASI PST8 * 76543 210 
mask for bits 63:56 — h 
mask for bits 55:48 
mask for bits 15:8 
mask for bits 7:0 
R[rs2] 


16-bit partial store mask 
for ASI_PST16_* 


mask for bits 63:48 
mask for bits 47:32 
mask for bits 31:16 
mask for bits 15:0 


R[rs2] 
32-bit partial store mask 


for ASI_PST32_* 1 0 
mask for bits 63:32 A A 
mask for bits 31:0 


FIGURE 7-29 Mask Format for Partial Store 








Exceptions. In an UltraSPARC Architecture 2005 implementation, these 
instructions are not implemented in hardware, cause a data_access_exception 
exception, and are emulated in software. 


An attempt to execute a STPARTIALF instruction when i = 1 causes an 
illegal instruction exception. 
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Exceptions 


STPARTIALF 


If the floating-point unit is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if the 
FPU is not present, then an attempt to execute a STPARTIALF instruction causes an 
fp_disabled exception. 


STPARTIALF causes a mem_address_not_aligned exception if the effective memory 
address is not word-aligned. 


STPARTIALF requires only word alignment in memory for eight byte stores. If the 
effective address is word-aligned but not doubleword-aligned, it generates an 
STDF mem adaress not aligned exception. In this case, the trap handler software 
shall emulate the STDFA instruction and return. 


IMPL. DEP. #249-U3-Cs10: For an STPARTIAL instruction, the following aspects of 
data watchpoints are implementation dependent: (a) whether data watchpoint logic 
examines the byte store mask in R[rs2] or it conservatively behaves as if every 
Partial Store always stores all 8 bytes, and (b) whether data watchpoint logic 
examines individual bits in the Virtual (Physical) Data Watchpoint Mask in the LSU 
Control register DCUCR to determine which bytes are being watched or (when the 
Watchpoint Mask is nonzero) it conservatively behaves as if all 8 bytes are being 
watched. 


ASIs C04165-C516 and C816-CD46 are only used for partial store operations. In 
particular, they should not be used with the LDDFA instruction; however, if any of 
them is used, the resulting behavior is specified in the LDDFA instruction 
description on page 241. 


Implementation | STPARTIALF shares an opcode with the STBLOCKF, STDFA, 
Note | and STSHORTF instructions; it is distinguished by the ASI used. 


illegal instruction 
fo disabled 
data access exception (not implemented in hardware in UA-2005) 
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STSHORTF 





7.95 Store Short Floating-Point 





ASI 
Instruction Value Operation Assembly Language Syntax Class 
STSHORTF D0j¢  8-bit store to primary address space stda fregrg, [regaddr] #ASI_FL8_P C3 
stda  fregar [reg_plus_imm] $asi 
STSHORTF D14g  8-bit store to secondary address space stda freg;qg, [regaddr] #ASI_FL8_S C3 


stda  fregar [reg_plus_imm] sasi 
STSHORTF D846  8-bit store to primary address space, stda fregrar (regaddr] #ASI_FL8_PL C3 
little-endian stda  fregra (reg plus imm] asi 
STSHORTF D9%:6  8-bitstore to secondary address space, stda  freg,g, [regaddr] $ASI FL8 SL C3 
little-endian stda fregra (reg plus imm] %asi 
STSHORTF D216 16-bit store to primary address space stda  freg;gr [regaddr] #ASI_FL16_P C3 
stda  fregrar [reg_plus_imm] %asi 
STSHORTF D316 16-bit store to secondary address space stda fregrar [regaddr] #ASI_FL16_S C3 
stda  fregrar lreg plus imm] %asi 
STSHORTF  DAx% 16-bit store to primary address space, — stda  freg,g, [regaddr| #ASI_FL16_PL C3 
little-endian tda  /freg,g, [reg plus imm] %asi 


STSHORTF DB, 16-bit store to secondary address space, 
little-endian 


o 

















tda  fregrar [regaddr] $&ASI FL16 SL C3 
tda  fregrg, |reg plus imm] $asi 


o 











o 


WHO E ww I9 
"em 


31 30 29 25 24 19 18 14 13 5 4 0 


Description The short floating-point store instruction allows 8- and 16-bit stores to be performed 
from the floating-point registers. Short stores access the low-order 8 or 16 bits of the 
register. 


Little-endian ASIs transfer data in little-endian format from memory; otherwise, 
memory is assumed to be big-endian. Short stores are typically used with the 
FALIGNDATA instruction (see Align Data on page 161) to assemble or store 64 bits 
on noncontiguous components. 


Implementation | STSHORTF shares an opcode with the STBLOCKF, STDFA, and 
Note | STPARTIALF instructions; it is distinguished by the ASI used. 


In an UltraSPARC Architecture 2005 implementation, these instructions are not 
implemented in hardware, cause an data_access_exception exception, and are 
emulated in software. 
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STSHORTF 


If the floating-point unit is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if the 
FPU is not present, then an attempt to execute a STSHORTF instruction causes an 
fp_disabled exception. 


STSHORTF causes a mem address not aligned exception if the effective memory 
address is not halfword-aligned. 


An 8-bit STSHORTF (using ASI D016, D116, D816, or D946) can be performed to an 
arbitrary memory address (no alignment requirement). 


A 16-bit STSHORTF (using ASI D216, D316, DA16, or DBy¢) to an address that is not 
halfword-aligned (an odd address) causes a mem adaress not aligned exception. 


Exceptions VA watchpoint 
data access exception 
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STTW (Deprecated) 





7.96 | Store Integer Twin Word 


The STTW instruction is deprecated and should not be used in new software. 
The STX instruction should be used instead. 


Opcode op3 Operation Assembly Language Syntax + Class 








STTWP 000111 Store Integer Twin Word sttw reg rq, [address] D2 


+ The original assembly language syntax for this instruction used an “std” instruction mnemonic, which is now 
deprecated. Over time, assemblers will support the new "sttw" mnemonic for this instruction. In the meantime, 
some existing assemblers may only recognize the original “std” mnemonic. 


GIG I €9—I MH ———I1—- 


31 30 29 25 24 19 18 14 13 12 5 4 0 


Description The store integer twin word instruction (STTW) copies two words from an R register 
pair into memory. The least significant 32 bits of the even-numbered R register are 
written into memory at the effective address, and the least significant 32 bits of the 
following odd-numbered R register are written into memory at the “effective 
address + 4”. 


The least significant bit of the rd field of a store twin word instruction is unused and 
should always be set to 0 by software. 


STTW accesses memory using the implicit ASI (see page 104). The effective address 
for this instruction is "R[rs1] + R[rs2]" if i= 0, or "R[rs1] + sign. ext (simm13)" if 
| - 1. 


A successful store twin word instruction operates atomically. 
IMPL. DEP. #108-V9a: It is implementation dependent whether STTW is 
implemented in hardware. If not, an attempt to execute it will cause an 


unimplemented STTW exception. (STTW is implemented in hardware in all 
UItraSPARC Architecture 2005 implementations.) 


An attempt to execute an STTW instruction when either of the following conditions 
exist causes an illegal instruction exception: 


m destination register number rd is an odd number (is misaligned) 
m i=0 and instruction bits 12:5 are nonzero 
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STTW (Deprecated) 


STTW causes a mem adaress not aligned exception if the effective address is not 
doubleword-aligned. 


With respect to little-endian memory, an STTW instruction behaves as if it is 
composed of two 32-bit stores, each of which is byte-swapped independently before 
being written into its respective destination memory word. 


Programming 
Notes 





STTW is provided for compatibility with SPARC V8. It may 
execute slowly on SPARC V9 machines because of data path and 
register-access difficulties. Therefore, software should avoid 
using STTW. 


If STTW is emulated in software, STX instruction should be 
used for the memory access in the emulation code to preserve 
atomicity. 


Exceptions unimplemented_STTW 
illegal_instruction 
mem address not aligned 


VA watchpoint 


See Also STW/STX on page 313 
STTWA on page 336 
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STTWA (Deprecated) 





7.97 Store Integer Twin Word into Alternate 
Space 


The STTWA instruction is deprecated and should not be used in new software. 
The STXA instruction should be used instead. 








Opcode op3 Operation Assembly Language Syntax Class 





STTWA™ FAST 010111 Store Twin Word into Alternate Space — st twa regyg[regaddr] imm asi D2, Y3+ 
sttwa rega [reg plus imm] %asi 





+ The original assembly language syntax for this instruction used an “stda” instruction mnemonic, which is now deprecated. Over 
time, assemblers will support the new “st twa” mnemonic for this instruction. In the meantime, some existing assemblers may only 
recognize the original “st da” mnemonic. 





t Y3 for restricted ASIs (00:6-7F16); D2 for unrestricted ASIs (8016-FF16) 


ER IX 


31 30 29 25 24 19 18 14 13 12 5 4 0 


Description The store twin word integer into alternate space instruction (STTWA) copies two 
words from an R register pair into memory. The least significant 32 bits of the even- 
numbered R register are written into memory at the effective address, and the least 
significant 32 bits of the following odd-numbered R register are written into memory 
at the “effective address + 4”. 


The least significant bit of the rd field of an STTWA instruction is unused and should 
always be set to 0 by software. 


Store integer twin word to alternate space instructions contain the address space 
identifier (ASI) to be used for the store in the imm_asi field if i = 0, or in the ASI 
register if i= 1. The access is privileged if bit 7 of the ASI is 0; otherwise, it is not 
privileged. The effective address for these instructions is "R[rs1] + R[rs2]" if i = 0, or 
“R[rs1]+sign_ext (simm13)" if i = 1. 


A successful store twin word instruction operates atomically. 


With respect to little-endian memory, an STTWA instruction behaves as if it is 
composed of two 32-bit stores, each of which is byte-swapped independently before 
being written into its respective destination memory word. 
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STTWA (Deprecated) 


IMPL. DEP. #108-V9b: It is implementation dependent whether STTWA is 
implemented in hardware. If not, an attempt to execute it will cause an 
unimplemented_STTW exception. (STTWA is implemented in hardware in all 
UltraSPARC Architecture 2005 implementations.) 


An attempt to execute an STTWA instruction with a misaligned (odd) destination 
register number rd causes an illegal_instruction exception. 


STTWA causes a mem_address_not_aligned exception if the effective address is not 
doubleword-aligned. 


In nonprivileged mode (PSTATE.priv = 0), if bit 7 of the ASI is 0, this instruction 
causes a privileged_action exception. In privileged mode (PSTATE.priv = 1), if the 
ASI is in the range 3046 to 7F36, this instruction causes a privileged action exception. 


STTWA can be used with any of the following ASIs, subject to the privilege mode 
rules described for the privileged action exception above. Use of any other ASI with 
this instruction causes a data access exception exception (impl. dep. #300-U4- 














Cs10). 
ASls valid for STTWA 
ASI NUCLEUS ASI NUCLEUS, LITTLE 
ASI AS IF USER, PRIMARY ASI AS IF USER PRIMARY LITTLE 
ASI AS IF USER SECONDARY ASI AS IF USER SECONDARY LITTLE 
ASI REAL ASI REAL LITTLE 
ASI REAL IO ASI REAL IO LITTLE 
ASI PRIMARY ASI PRIMARY LITTLE 
ASI SECONDARY ASI SECONDARY LITTLE 








Programming | Nontranslating ASIs (see page 399) may only be accessed using 
Note | STXA (not STTWA) instructions. If an STTWA referencing a 

nontranslating ASI is executed, per the above table, it generates 

a data access exception exception (impl. dep. #300-U4-Cs10). 


Programming | STTWA is provided for compatibility with existing SPARC V8 

Note | software. It may execute slowly on SPARC V9 machines because 
of data path and register-access difficulties. Therefore, software 
should avoid using STTWA. 


If STTWA is emulated in software, the STXA instruction should 
be used for the memory access in the emulation code to preserve 
atomicity. 





Exceptions unimplemented STTW 
illegal instruction 
mem address not aligned 
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STTWA (Deprecated) 


privileged_action 
VA_watchpoint 


See Also STWA/STXA on page 314 
STTW on page 334 
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STXFSR 





7.98 Store Floating-Point State Register 


Instruction  op3 rd Operation Assembly Language Class 
10 0101 0 (see page 327) 
STXFSR 10 0101 1 Store Floating-Point State register stx $fsr, [address] A1 


— 10 0101 2-31 Reserved 





RTE EEE 
Ux 


31 30 29 25 24 19 18 14 13 12 5 4 0 


Description The store floating-point state register instruction (STXFSR) waits for any currently 
executing FPop instructions to complete, and then it writes all 64 bits of the FSR into 
memory. 


STXFSR zeroes FSR.ftt after writing the FSR to memory. 


Implementation | FSR.ftt should not be zeroed by STXFSR until it is known that the 
Note | store will not cause a precise trap. 


STXFSR accesses memory using the implicit ASI (see page 104). The effective 
address for this instruction is "R[rs1] + R[rs2]" if i = 0, or 
^R[rs1] + sign ext (simm13)" if i = 1. 


Exceptions. An attempt to execute a STXFSR instruction when i = 0 and instruction 
bits 12:5 are nonzero causes an illegal instruction exception. 


If the floating-point unit is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if the 
FPU is not present, then an attempt to execute a STXFSR instruction causes an 
fp disabled exception. 


If the effective address is not doubleword-aligned, an attempt to execute an 
STXFSRinstruction causes a mem adaress not aligned exception. 


Implementation | STXFSR shares an opcode with the (deprecated) STFSR 
Note | instruction (and possibly with other implementation-dependent 
instructions); they are differentiated by the instruction rd field. 
An attempt to execute the op = 105, op3 = 10 0101, opcode with 
an invalid rd value causes an illegal instruction exception. 
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STXFSR 


Exceptions illegal instruction 
fo disabled 
mem address not aligned 
VA watchpoint 


See Also Load Floating-Point State Register on page 258 
Store Floating-Point on page 321 
Store Floating-Point State Register (Lower) on page 327 
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SUB 





7.99 Subtract 


Instruction op3 Operation Assembly Language Syntax Class - 
SUB 000100 Subtract sub Tégrg1, leg or imm, Tegyg A1 
SUBcc 010100 Subtract and modify cc’s subcc ergy, leg or imm, TESrd A1 
SUBC 00 1100 Subtract with Carry subc VESrstr leg or imm, regrg A1 
SUBCcc 011100 Subtract with Carry and modify cc's subccc regn, leg or imm, Tegra A1 


TETE 
PT DT me —] 


31 30 29 25 24 19 18 14 13 12 5 4 0 


Description These instructions compute “R[rs1]- R[rs2]" if i = 0, or 
^R[rs1] - sign ext (simm13)" if i = 1, and write the difference into R[rd]. 


SUBC and SUBCcc ("SUBtract with carry") also subtract the CCR register's 32-bit 
carry (icc.c) bit; that is, they compute "R[rs1] - R[rs2] - icc.c" or 
“R[rs1] - sign ext (simm13) - icc.c" and write the difference into R[rd]. 


SUBcc and SUBCcc modify the integer condition codes (CCR.icc and CCR.xcc). A 32- 
bit overflow (CCR.icc.v) occurs on subtraction if bit 31 (the sign) of the operands 
differs and bit 31 (the sign) of the difference differs from R[rs1](31]. A 64-bit 
overflow (CCR.xcc.v) occurs on subtraction if bit 63 (the sign) of the operands differs 
and bit 63 (the sign) of the difference differs from R[rs1]{63}. 
Programming | A SUBcc instruction with rd = 0 can be used to effect a signed or 
Notes | unsigned integer comparison. See the cmp synthetic instruction in 
Appendix C, Assembly Language Syntax. 
SUBC and SUBCcc read the 32-bit condition codes’ carry bit 
(CCR.icc.c), not the 64-bit condition codes’ carry bit (CCR.xcc.c). 


An attempt to execute a SUB instruction when i = 0 and instruction bits 12:5 are 


nonzero causes an illegal instruction exception. 


Exceptions illegal instruction 
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SWAP (Deprecated) 





7.100 Swap Register with Memory 


The SWAP instruction is deprecated and should not be used in new software. 
The CASA or CASXA instruction should be used instead. 
Opcode op3 Operation Assembly Language Syntax Class 
SWAP” 001111 Swap Register with Memory swap [address], regra D2 











GIG Ts GM — E 


Description 


Exceptions 


25 24 19 18 14 13 12 5 4 0 


SWAP exchanges the less significant 32 bits of R[rd] with the contents of the word at 
the addressed memory location. The upper 32 bits of R[rd] are set to 0. The operation 
is performed atomically, that is, without allowing intervening interrupts or deferred 
traps. In a multiprocessor system, two or more virtual processors executing CASA, 
CASXA, SWAP, SWAPA, LDSTUB, or LDSTUBA instructions addressing any or all of 
the same doubleword simultaneously are guaranteed to execute them in an 
undefined, but serial, order. 


SWAP accesses memory using the implicit ASI (see page 104). The effective address 
for these instructions is “R[rs1] + R[rs2]" if i= 0, or "R[rs1] + sign ext (simm13)" if 
| - 1. 

An attempt to execute a SWAP instruction when i = 0 and instruction bits 12:5 are 
nonzero causes an illegal instruction exception. 


If the effective address is not word-aligned, an attempt to execute a SWAP instruction 
causes a mem address not aligned exception. 


The coherence and atomicity of memory operations between virtual processors and 
I/O DMA memory accesses are implementation dependent (impl. dep. #120-V9). 


illegal instruction 
mem address not aligned 
VA watchpoint 
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SWAPA (Deprecated) 





7.101 Swap Register with Alternate Space 
Memory 


The SWAPA instruction is deprecated and should not be used in new software. 
The CASXA instruction should be used instead. 








Opcode op3 Operation Assembly Language Syntax Class 
SWAPA-^' ^9 011111 Swap register with Alternate Space swapa [regaddr] imm asi, reg, D2, Y3t 
Memory swapa [reg plus imm] $asi, regra 





t Y3 for restricted ASIs (0016-7F16); D2 for unrestricted ASIs (8016-FF16) 


oT os. ome |e 


31 30 29 25 24 19 18 14 13 12 5 4 0 


Description SWAPA exchanges the less significant 32 bits of R[rd] with the contents of the word 
at the addressed memory location. The upper 32 bits of R[rd] are set to 0. The 
operation is performed atomically, that is, without allowing intervening interrupts 
or deferred traps. In a multiprocessor system, two or more virtual processors 
executing CASA, CASXA, SWAP, SWAPA, LDSTUB, or LDSTUBA instructions 
addressing any or all of the same doubleword simultaneously are guaranteed to 
execute them in an undefined, but serial, order. 


The SWAPA instruction contains the address space identifier (ASI) to be used for the 
load in the imm asi field if i = 0, or in the ASI register if i = 1. The access is 
privileged if bit 7 of the ASI is 0; otherwise, it is not privileged. The effective address 
for this instruction is “R[rs1] + R[rs2]” if i= 0, or "R[rs1] + sign ext (simm13)" if 
i= 1. 


This instruction causes a mem address not aligned exception if the effective 
address is not word-aligned. It causes a privileged action exception if 
PSTATE.priv = 0 and bit 7 of the ASI is 0. 


The coherence and atomicity of memory operations between virtual processors and 
I/O DMA memory accesses are implementation dependent (impl. dep #120-V9). 


If the effective address is not word-aligned, an attempt to execute a SWAPA 
instruction causes a mem_address_not_aligned exception. 
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SWAPA (Deprecated) 


In nonprivileged mode (PSTATE.priv = 0), if bit 7 of the ASI is 0, this instruction 


causes a privileged_action exception. In privileged mode ( 


PSTATE.priv = 1), if the 


ASI is in the range 3046 to 7F36, this instruction causes a privileged action exception. 


SWAPA can be used with any of the following ASIs, subject to the privilege mode 
rules described for the privileged action exception above. Use of any other ASI with 
this instruction causes a data access exception exception. 





ASis valid for SWAPA 

















ASI NUCLEUS ASI NUCLEUS LITTLE 
ASI AS IF USER, PRIMARY ASI AS IF USER, PRIMARY LITTLE 
ASI AS IF USER, SECONDARY ASI AS IF USER, SECONDARY LITTLE 
ASI PRIMARY ASI PRIMARY LITTLE 
ASI SECONDARY ASI SECONDARY LITTLE 
ASI REAL ASI REAL LITTLE 

Exceptions mem address not aligned 


privileged action 
VA watchpoint 
data access exception 
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TADDcc 





7.102 


Tagged Add 


Instruction op3 Operation Assembly Language Syntax Class 


TADDcc 


100000 Tagged Add and modify cc's taddcc regygy, reg or imm, Tegrq Al 





DC w [ - [s HR - T 


31 30 29 


Description 


Exceptions 


See Also 


25 24 19 18 14 18 12 5 4 0 


This instruction computes a sum that is "R[rs1] + R[rs2]" if i = 0, or 
^R[rs1] + sign ext (simm13)" if i = 1. 


TADD«cc modifies the integer condition codes (icc and xcc). 


A tag overflow condition occurs if bit 1 or bit 0 of either operand is nonzero or if the 
addition generates 32-bit arithmetic overflow (that is, both operands have the same 
value in bit 31 and bit 31 of the sum is different). 


If a TADDcc causes a tag overflow, the 32-bit overflow bit (CCR.icc.v) is set to 1; if 
TADD«cc does not cause a tag overflow, CCR.icc.v is set to 0. 


In either case, the remaining integer condition codes (both the other CCR.icc bits and 
all the CCR.xcc bits) are also updated as they would be for a normal ADD 
instruction. In particular, the setting of the CCR.xcc.v bit is not determined by the 
tag overflow condition (tag overflow is used only to set the 32-bit overflow bit). 
CCR.xcc.v is set based on the 64-bit arithmetic overflow condition, like a normal 64- 
bit add. 


An attempt to execute a TADDcc instruction when i = 0 and instruction bits 12:5 are 
nonzero causes an illegal instruction exception. 


illegal instruction 


TADDccTV? on page 346 
TSUBcc on page 351 
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TADDccTV (Deprecated) 


7.103 Tagged Add and Trap on Overflow 


The TADDccTV instruction is deprecated and should not be used in new 
software. The TADDcc instruction followed by the BPVS instruction (with 


instructions to save the pre-TADDcc integer condition codes if necessary) should 
be used instead. 








Opcode 


Operation Assembly Language Syntax Class 





TADDccTV 


100010 Tagged Add and taddcctv  regysi, leg or imm, regyg D2 


modify cc's or Trap on Overflow 





mI Io Tee EE 


31 30 29 


Description 


25 24 19 18 14 13 12 5 4 0 


This instruction computes a sum that is "R[rs1] + R[rs2]" if i= 0, or 
^R[rs1] + sign ext (simm13)" if i = 1. 


TADDccTV modifies the integer condition codes if it does not trap. 


An attempt to execute a TADDccTV instruction when i = 0 and instruction bits 12:5 
are nonzero causes an /llegal instruction exception. 


A tag overflow condition occurs if bit 1 or bit 0 of either operand is nonzero or if the 
addition generates 32-bit arithmetic overflow (that is, both operands have the same 
value in bit 31 and bit 31 of the sum is different). 


If TADDccTV causes a tag overflow, a lag overflow exception is generated and R[rd] 
and the integer condition codes remain unchanged. If a TADDccTV does not cause a 
tag overflow, the sum is written into R[rd] and the integer condition codes are 
updated. CCR.icc.v is set to 0 to indicate no 32-bit overflow. 


In either case, the remaining integer condition codes (both the other CCR.icc bits and 
all the CCR.xcc bits) are also updated as they would be for a normal ADD 
instruction. In particular, the setting of the CCR.xcc.v bit is not determined by the 
tag overflow condition (tag overflow is used only to set the 32-bit overflow bit). 
CCR.xcc.v is set only on the basis of the normal 64-bit arithmetic overflow condition, 
like a normal 64-bit add. 
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TADDccTV (Deprecated) 


SPARC V8 | TADDccTV traps based on the 32-bit overflow condition, just as 
Compatibility | in the SPARC V8 architecture. Although the tagged add 
Note | instructions set the 64-bit condition codes CCR.xcc, there is no 
form of the instruction that traps on the 64-bit overflow 


condition. 
Exceptions illegal_instruction 
tag_overflow 
See Also TADDcc on page 345 


TSUBccTVP on page 352 
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Tcc 





7.104 Trap on Integer Condition Codes (Icc) 








Instruction op3 cond Operation cc Test Assembly Language Syntax Class 

TA 111010 1000 Trap Always 1 ta i or x cc, software trap number A1 

TN 111010 0000 ‘Trap Never 0 tn i or x cc, software trap number A1 

TNE 111010 1001 Trap on Not Equal not Z tnet i_or_x_cc, software_trap_number A1 

TE 111010 0001 Trap on Equal Z tet i_or_x_cc, software_trap_number A1 

TG 111010 1010 ‘Trap on Greater not(Zor(N tg i or x cc, software trap number A1 
xor V)) 


TLE 111010 0010 ‘Trap on Less or Equal Z or (N xor V) tle i or x cc, software trap number A1 
TGE 111010 1011 Trap on Greater or not (N xor V) tge i or x cc, software trap number A1 


Equal 

TL 111010 0011 Trap on Less N xor V tl i or x cc, software trap number A1 

TGU 111010 1100 Trap on Greater, not(CorZ) tgu i or x cc, software trap number A1 
Unsigned 

TLEU 111010 0100 Trap on Less or (C or Z) tleu i or x cc, software trap number A1 
Equal, Unsigned 

TCC 111010 1101 Trap on Carry Clear not C tcc? i or x cc, software trap number A1 


(Greater than or 
Equal, Unsigned) 








TCS 111010 0101 ‘Trap on Carry Set C tcs" i or x cc, software trap number A1 
(Less Than, Unsigned) 

TPOS 111010 1110 Trap on Positive or not N tpos i or x cc, software trap number A1 
Zero 

TNEG 111010 0110 Trap on Negative N tneg i or x cc, software trap number A1 

TVC 111010 1111 Trap on Overflow not V tvc i or x cc, software trap number A1 
Clear 

TVS 111010 0111 Trap on Overflow Set V tvs i or x cc, software trap number A1 

t synonym: tnz t synonym: tz ? synonym: tgeu Y synonym: tlu 


mLEDeTI-—1——— E 


ETT T S CEBBET T mE 


31 30 29 28 25 24 19 18 14 18 12 11 10 
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Description 


Tcc 








cc1 :: ccO Condition Codes Evaluated 
00 CCR.icc 
01 — (illegal instruction) 


10 CCR.xcc 


11 — (illegal instruction) 





The Tcc instruction evaluates the selected integer condition codes (icc or xcc) 
according to the cond field of the instruction, producing either a TRUE or FALSE 
result. If TRUE and no higher-priority exceptions or interrupt requests are pending, 
then a trap instruction or htrap instruction exception is generated. If FALSE, the 
trap instruction (or htrap instruction) exception does not occur and the instruction 
behaves like a NOP. 




















For brevity, in the remainder of this section the value of the "software trap number" 
used by Tcc will be referred to as “SWTN”. 


In nonprivileged mode, if i = 0 the SWTN is specified by the least significant seven 
bits of "R[rs1] + R[rs2]". If i= 1, the SWTN is provided by the least significant seven 
bits of "R[rs1] + imm trap £f". Therefore, the valid range of values for SWTN in 
nonprivileged mode is 0 to 127. The most significant 57 bits of SWTN are unused 
and should be supplied as zeroes by software. 


In privileged mode, if i = 0 the SWTN is specified by the least significant eight bits of 
“Rfrs1] + R[rs2]". If i= 1, the SWTN is provided by the least significant eight bits of 
^R[rs1] + imm trap 3". Therefore, the valid range of values for SWTN in privileged 
mode is 0 to 255. The most significant 56 bits of SWTN are unused an should be 
supplied as zeroes by software. 


Generally, values of 0 € SWTN x 127 are used to trap to privileged-mode software 
and values of 128 x SWTN «x 255 are used to trap to hyperprivileged-mode software. 
The behavior of Tcc, based on the privilege mode in effect when it is executed and 
the value of the supplied SWTN, is as follows: 


Behavior of Tcc instruction 


Privilege Mode in effect when Tcc is executed 0 x SWTN < 127 128 < SWTN < 255 





Nonprivileged trap instruction exception — 
(PSTATE.priv = 0) (to privileged mode) (not possible, because 


(256 < TT < 383) SWTN is a 7-bit value in 
nonprivileged mode) 


Privileged trap instruction exception  htrap_instruction exception 
(PSTATE.priv = 1) (to privileged mode) (to hyperprivileged mode) 


(256 < TT < 383) (384 < TT < 511) 
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Tcc 


Programming | Tcc can be used to implement breakpointing, tracing, and calls to 
Note | privileged and hyperprivileged software. It can also be used for 
runtime checks, such as for out-of-range array indexes and integer 
overflow. 


Exceptions. An attempt to execute a Tcc instruction when any of the following 
conditions exist causes an /llegal instruction exception: 


instruction bit 29 is nonzero 

i= 0 and instruction bits 12:5 are nonzero 
i= 1 and instruction bits 10:8 are nonzero 
cc0O = 1 


If a Tcc instruction causes a trap_instruction trap, 256 plus the SWTN value is written 
into TT[TL]. Then the trap is taken and the virtual processor performs the normal 
trap entry procedure, as described in Trap Processing on page 443. 


Exceptions illegal instruction 
trap instruction (0 < SWTN < 127) 
htrap instruction (128 < SWTN < 255) 
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TSUBcc 





7.105 ‘Tagged Subtract 


Instruction 


TSUBcc 


op3 


Operation Assembly Language Syntax Class 


100001 ‘Tagged Subtract and modify cc's — tsubcc regrsir reg or imm, reg;g A1 





mI WII a 
d 


31 30 29 


Description 


Exceptions 


See Also 


25 24 19 18 14 13 12 5 4 0 


This instruction computes “R[rs1] — R[rs2]" if i = 0, or 
^R[rs1] - sign ext (simm13)" if i = 1. 


TSUBcc modifies the integer condition codes (icc and xcc). 


A tag overflow condition occurs if bit 1 or bit 0 of either operand is nonzero or if the 
subtraction generates 32-bit arithmetic overflow; that is, the operands have different 
values in bit 31 (the 32-bit sign bit) and the sign of the 32-bit difference in bit 31 
differs from bit 31 of R[rs1]. 


If a TSUBcc causes a tag overflow, the 32-bit overflow bit (CCR.icc.v) is set to 1; if 
TSUBcc does not cause a tag overflow, CCR.icc.v is set to 0. 


In either case, the remaining integer condition codes (both the other CCR.icc bits and 
all the CCR.xcc bits) are also updated as they would be for a normal subtract 
instruction. In particular, the setting of the CCR.xcc.v bit is not determined by the 
tag overflow condition (tag overflow is used only to set the 32-bit overflow bit). 
ccr.xcc.v is set based on the 64-bit arithmetic overflow condition, like a normal 64-bit 
subtract. 


An attempt to execute a TSUBcc instruction when i = 0 and instruction bits 12:5 are 
nonzero causes an illegal instruction exception. 


illegal instruction 


TADDcc on page 345 
TSUBccTVP on page 352 
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TSUBccTV (Deprecated) 





7.106 ‘Tagged Subtract and Trap on Overflow 


The TSUBccTV instruction is deprecated and should not be used in new 
software. The TSUBcc instruction followed by BPVS instead (with instructions to 


save the pre-TSUBcc integer condition codes if necessary) should be used 
instead. 








Opcode op3 Operation Assembly Language Syntax Class 





TSUBccTV 100011 Tagged Subtract and tsubcctv egrs reg_or_imm, regrg D2 
modify cc's or Trap on Overflow 





mI [ow I a e 


31 30 29 25 24 19 18 14 13 12 5 4 0 


Description This instruction computes “R[rs1] — R[rs2]" if i = 0, or "R[rs1] — sign ext (simm13)” 
ifi-1. 


TSUBccTV modifies the integer condition codes (icc and xcc) if it does not trap. 


A tag overflow condition occurs if bit 1 or bit 0 of either operand is nonzero or if the 
subtraction generates 32-bit arithmetic overflow; that is, the operands have different 
values in bit 31 (the 32-bit sign bit) and the sign of the 32-bit difference in bit 31 
differs from bit 31 of R[rs1]. 


An attempt to execute a TSUBccTV instruction when i = 0 and instruction bits 12:5 are 
nonzero causes an illegal instruction exception. 


If TSUBccTV causes a tag overflow, then a tag overflow exception is generated and 
R[rd] and the integer condition codes remain unchanged. If a TSUBccTV does not 
cause a tag overflow condition, the difference is written into R[rd] and the integer 
condition codes are updated. CCR.icc.v is set to 0 to indicate no 32-bit overflow. 


In either case, the remaining integer condition codes (both the other CCR.icc bits and 
all the CCR.xcc bits) are also updated as they would be for a normal subtract 
instruction. In particular, the setting of the CCR.xcc.v bit is not determined by the 
tag overflow condition (tag overflow is used only to set the 32-bit overflow bit). 
CCR.xcc.v is set only on the basis of the normal 64-bit arithmetic overflow condition, 
like a normal 64-bit subtract. 
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TSUBccTV (Deprecated) 


SPARC V8 | TSUBccTV traps based on the 32-bit overflow condition, just as 
Compatibility | in the SPARC V8 architecture. Although the tagged add 
Note | instructions set the 64-bit condition codes CCR.xcc, there is no 
form of the instruction that traps on the 64-bit overflow 


condition. 
Exceptions illegal instruction 
tag overflow 
See Also TADDccTV? on page 346 


TSUBcc on page 351 
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UDIV, UDIVcc (Deprecated) 





7.107 


Unsigned Divide (64-bit + 32-bit) 


The UDIV and UDIVcc instructions are deprecated and should not be used in 
new software. The UDIVX instruction should be used instead. 





Opcode op3 Operation Assembly Language Syntax Class 
UDIVP 001110 Unsigned Integer Divide udiv Te rg1, l'eg. 0r. imr, leg rq D2 
UDIVccP? 011110 Unsigned Integer Divide and modify cc's udivcc  regy;,reg or imm, regra D2 





m SH Ix 


31 30 29 25 24 19 18 14 13 12 5 4 0 

Description The unsigned divide instructions perform 64-bit by 32-bit division, producing a 32- 
bit result. If i = 0, they compute "(Y :: R[rs1](31:0]) + R[rs2]{31:0}”. Otherwise (that is, 
if i = 1), the divide instructions compute "(Y :: R[rs1]{31:0}) + 
(sign ext(simm13)(31:0])". In either case, if overflow does not occur, the less 
significant 32 bits of the integer quotient are sign- or zero-extended to 64 bits and are 
written into R[rd]. 
The contents of the Y register are undefined after any 64-bit by 32-bit integer divide 
operation. 

Unsigned Divide 


Unsigned divide (UDIV, UDIVcc) assumes an unsigned integer doubleword 
dividend (Y :: R[rs1]{31:0}) and an unsigned integer word divisor R[rs2{31:0}] or 
(sign ext (simm13)(31:0]) and computes an unsigned integer word quotient (R[rd]). 
Immediate values in simm13 are in the ranges 0 to 212-1 and 22 - 2? to 22-1 for 
unsigned divide instructions. 


Unsigned division rounds an inexact rational quotient toward zero. 


Programming | The rational quotient is the infinitely precise result quotient. It 
Note | includes both the integer part and the fractional part of the 
result. For example, the rational quotient of 11/4 = 2.75 (integer 
part = 2, fractional part = .75). 
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Exceptions 


See Also 


UDIV, UDIVcc (Deprecated) 


The result of an unsigned divide instruction can overflow the less significant 32 bits 
of the destination register R[rd] under certain conditions. When overflow occurs, the 
largest appropriate unsigned integer is returned as the quotient in R[rd]. The 
condition under which overflow occurs and the value returned in R[rd] under this 
condition are specified in TABLE 7-15. 


TABLE 7-15 UDIV / UDIVcc Overflow Detection and Value Returned 





Condition Under Which Overflow Occurs Value Returned in R[rd] 
Rational quotient > 2° 232 4 
(0000 0000 FFFF FFFF 6) 





When no overflow occurs, the 32-bit result is zero-extended to 64 bits and written 
into register R[rd]. 


UDIV does not affect the condition code bits. UDIVcc writes the integer condition 
code bits as shown in the following table. Note that negative (N) and zero (Z) are set 
according to the value of R[rd] after it has been set to reflect overflow, if any. 








Bit Effect on bit of UDIVcc instruction 
icc.n Set if R[rd]{31} =1 

icc.z Set if R[rd]{31:0} = 0 

icc.v Set if overflow (per TABLE 7-15) 
icc.c Zero 

xcc.n Set if R[rd]{63} = 1 

XCC.Z Set if R[rd]{63:0} = 0 

XCC.V Zero 

XCC.C Zero 


An attempt to execute a UDIV or UDIVcc instruction when i = 0 and instruction bits 
12:5 are nonzero causes an illegal instruction exception. 


illegal instruction 
division by zero 


RDY on page 287 


SDIV [cc] on page 304, 
UMULIcc] on page 356 
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UMUL, UMULcc (Deprecated) 





7.108 Unsigned Multiply (32-bit) 


The UMUL and UMULcc instructions are deprecated and should not be used in 
new software. The MULX instruction should be used instead. 





Opcode op3 Operation Assembly Language Syntax Class 
UMULP 001010 Unsigned Integer Multiply umul Tegrg1, leg Or imm, Tegra D2 
UMULccP 011010 Unsigned Integer Multiply and modify cc's umulcc regysy, reg or imm, regrg D2 





BI = o Tee - | = 


31 30 29 25 24 19 18 14 13 12 5 4 0 


Description The unsigned multiply instructions perform 32-bit by 32-bit multiplications, 
producing 64-bit results. They compute “R[rs1]{31:0} x R[rs2](31:0]" if i = 0, or 
^R[rs1](31:0] x sign. ext (simm13)(31:0]" if i = 1. They write the 32 most significant 
bits of the product into the Y register and all 64 bits of the product into R[rd]. 


Unsigned multiply instructions (UMUL, UMULcc) operate on unsigned integer 
word operands and compute an unsigned integer doubleword product. 


UMUL does not affect the condition code bits. UMULcc writes the integer condition 
code bits, icc and xcc, as shown below. 


Bit Effect on bit by execution of UMULcc 

icc.n Set to 1 if product{31} = 1; otherwise, set to 0 
icc.z Set to 1 if product{31:0}= 0; otherwise, set to 0 
icc.v Set to 0 

icc.c Set to 0 

Xcc.n Set to 1 if product{63} = 1; otherwise, set to 0 
XCC.Z Set to 1 if product{63:0} = 0; otherwise, set to 0 
XCC.V Set to 0 

XCC.C Set to 0 





Note | 32-bit negative (icc.n) and zero (icc.z) condition codes are set 
according to the less significant word of the product, not 
according to the full 64-bit result. 
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UMUL, UMULcc (Deprecated) 


Programming | 32-bit overflow after UMUL or UMULcc is indicated by Y + 0. 
Notes 


An attempt to execute a UMUL or UMULcc instruction when i = 0 and instruction 
bits 12:5 are nonzero causes an illegal instruction exception. 


Exceptions illegal_instruction 


See Also MULScc on page 270 
RDY on page 287 
SMULIcc] on page 311, 
UDIVIcc] on page 354 
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WRasr 


7.109 Write Ancillary State Register 


Instruction 


WRYP 


WRCCR 


WRASI 


WRFPRS 


WRPCRP 


WRPICPrc 


WRGSR 
WRSOFTINT. SETP 


WRSOFTINT CLR 
WRSOFTINT? 


WRTICK_CMPR? 


WRSTICK_CMPR? 


rd 
0 
1 
2 


an oO BR Q 


7-14 


25 
26 


27 


28 


Operation 

Write Y register (deprecated) wr 
Reserved 

Write Condition Codes wr 
register 

Write ASI register wr 


Reserved (read-only ASR (TICK)) 
Reserved (read-only ASR (PC)) 


Write Floating-Point Registers Status wr 
register 


Reserved 
used at higher privilege level 


Write Performance Control register wr 


(PCR) 


Write Performance Instrumentation wr 
Counters (PIC) 


Reserved (impl. dep. #8-V8-Cs20, #9- 
V8-Cs20) 


Write General Status register (GSR) wr 


Set bits of per-virtual processor Soft wr 
Interrupt register 


Clear bits of per-virtual processor Soft wr 
Interrupt register 


Write per-virtual processor Soft wr 
Interrupt register 


Write Tick Compare register wr 
used at higher privilege level 
Write System Tick Compare register wr 


Reserved 
(impl. dep. #8-V8-Cs20, 9-V8-Cs20) 
Reserved 
(impl. dep. #8-V8-Cs20, 9-V8-Cs20) 


Implementation dependent 
(impl. dep. #8-V8-Cs20, 9-V8-Cs20) 


Assembly Language Syntax 


TES rg, l'eg or imm, Zy 


Tegrg1, l'eg or imm, $ccr 


Tegrg1, leg or imm, Sasi 


Tegrg1, l'eg or imm, $fprs 


Tégrg1, l'eg or imm, $pcr 


Tégrg1, l'eg or imm, $pic 


Tegrg1, l'eg or imm, $gsr 


Tegrg1, leg or imm, $softint set 
Tegrg1, l'eg or imm, $softint clr 
Teg,g1, l'eg or imm, $softint 


Teg,g1, l'eg Or imm, Stick cmpr 


Tegrg1, leg or imm, $stick cmprt 


Class 
D1 


A1 


A1 


A1 


A1 


A1 


A1 
N1 


N1 


N1 


N1 


N1 
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WRasr 





Instruction 


rd Operation Assembly Language Syntax Class 


29-31 Implementation dependent (impl. 
dep. #8-V8-Cs20, 9-V8-Cs20) 


t The original assembly language names for stick and $stick cmpr were, respectively, $sys_tick and $sys tick cmpr, which are 


now deprecated. Over time, assemblers will support the new stick and $stick cmpr names for these registers (which are consistent 


with stick and $tick cmpr). In the meantime, some existing assemblers may only recognize the original names. 


10 





v CE ye py 1—* 


31 30 29 


Description 


25 24 19 18 14 13 12 5 4 0 


The WRasr instructions each store a value to the writable fields of the ancillary state 
register (ASR) specified by rd. 


The value stored by these instructions (other than the implementation-dependent 
variants) is as follows: if i = 0, store the value “R[rs1] xor R[rs2]"; if i = 1, store 
^R[rs1] xor sign. ext (simm13)". 


Note | The operation is exclusive-or. 


The WRasr instruction with rs1 = 0 is a (deprecated) WRY instruction (which should 
not be used in new software). WRY is not a delayed-write instruction; the instruction 
immediately following a WRY observes the new value of the Y register. 


The WRY instruction is deprecated. It is recommended that all instructions that 
reference the Y register be avoided. 





WRCCR, WRFPRS, and WRASI are not delayed-write instructions. The instruction 
immediately following a WRCCR, WRFPRS, or WRASI observes the new value of 
the CCR, FPRS, or ASI register. 


WRFPRS waits for any pending floating-point operations to complete before writing 
the FPRS register. 


IMPL. DEP. #48-V8-Cs20: WRasr instructions with rd in the range 26-31 are 

available for implementation-dependent uses (impl. dep. #8-V8-Cs20). For a WRasr 

instruction with rd in the range 26-31, the following are implementation dependent: 

m the interpretation of bits 18:0 in the instruction 

m the operation(s) performed (for example, xor) to generate the value written to the 
ASR 

m whether the instruction is nonprivileged or privileged (impl. dep. #9-V8-Cs20), 
and 

m whether an attempt to execute the instruction causes an illegal instruction 
exception. 
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Exceptions 


See Also 


WRasr 


Note | See the section “Read/Write Ancillary State Registers (ASRs)” in 
Extending the UltraSPARC Architecture, contained in the separate 
volume UltraSPARC Architecture Application Notes, for a 
discussion of extending the SPARC V9 instruction set by means of 
read/write ASR instructions. 


V9 | Ancillary state registers may include (for example) timer, counter, 
Compatibility | diagnostic, self-test, and trap-control registers. 

Notes | The SPARC V8 WRIER, WRPSR, WRWIM, and WRTBR 
instructions do not exist in the UltraSPARC Architecture because 
the IER, PSR, TBR, and WIM registers do not exist in the 
UltraSPARC Architecture. 





See Ancillary State Registers on page 67 for more detailed information regarding ASR 
registers. 


Exceptions. An attempt to execute a WRasr instruction when any of the following 
conditions exist causes an illegal_instruction exception: 

m i=0 and instruction bits 12:5 are nonzero 

m rd- 1,4, 5, 7-14, 18, or 26-31 

m rd- 15 and ((rs1 #0) or (i = 0)) 


An attempt to execute a WRPCR (impl. dep. #250-U3-Cs10), WRSOFTINT. SET, 
WRSOFTINT. CLR, WRSOFTINT, WRTICK_CMPR, or WRSTICK CMPR instruction 
in nonprivileged mode (PSTATE.priv = 0) causes a privileged opcode exception. 


If the floating-point unit is not enabled (FPRS.fef = 0 or PSTATE.pef = 0) or if the 
FPU is not present, then an attempt to execute a WRGSR instruction causes an 
fp disabled exception. 


An attempt to execute a WRPIC instruction in nonprivileged mode (PSTATE.priv = 0) 
when PCR.priv = 1 causes a privileged action exception. 


illegal instruction 
privileged opcode 
fp disabled 
privileged action 


RDasr on page 287 
WRPR on page 361 
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WRPR 





7.110 Write Privileged Register 





Instruction  op3 Operation rd Assembly Language Syntax Class 
WRPR? 110010 Write Privileged register Al 
TPC 0 wrpr Teégrg,, l'eg or imm, $tpc 
TNPC 1 wrpr Teégrg,, l'eg or imm, $tnpc 
TSTATE 2 wrpr Tegrg,, leg or imm, %tstate 
TT 3 wrpr reSrgtr leg or imm, Stt 
(illegal instruction) 4 
TBA 5 wrpr reSrgtr leg or imm, %Stba 
PSTATE 6 wrpr reSrgtr leg or imm, %pstate 
TL 7 wrpr Teégrg, leg or imm, %t1 
PIL 8 wrpr VESrstr leg or imm, %pil 
CWP 9 wrpr Teégr, leg or imm, $cwp 
CANSAVE 10 wrpr VESrstr leg or imm, %cansave 
CANRESTORE 11 wrpr reSrgt, leg or imm, %canrestore 
CLEANWIN 12 wrpr VESrstr leg or imm, %cleanwin 
OTHERWIN 13 wrpr VESrstr leg or imm, %otherwin 
WSTATE 14 wrpr Tegrg,, l'eg or imm, $wstate 
Reserved 15 
GL 16 wrpr Teégr, leg or imm, %gl 
Reserved 17-31 


mI I1 H—————I-—* 


31 30 29 25 24 19 18 14 13 12 5 4 0 


Description This instruction stores the value “R[rs1] xor R[rs2]" if i= 0, or "R[rs1] xor 
sign ext (simm13)" if i= 1 to the writable fields of the specified privileged state 
register. 


Note | The operation is exclusive-or. 
The rd field in the instruction determines the privileged register that is written. 
There are MAXPTL copies of the TPC, TNPC, TT, and TSTATE registers, one for each 


trap level. A write to one of these registers sets the register, indexed by the current 
value in the trap-level register (TL). 
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Exceptions 


See Also 


WRPR 


A WRPR to TL only stores a value to TL; it does not cause a trap, cause a return from 
a trap, or alter any machine state other than TL and state (such as PC, NPC, TICK, 
etc.) that is indirectly modified by every instruction. 


Programming | A WRPR of TL can be used to read the values of TPC, TNPC, and 
Note | TSTATE for any trap level; however, software must take care that 
traps do not occur while the TL register is modified. 


The WRPR instruction is a non-delayed-write instruction. The instruction 
immediately following the WRPR observes any changes made to virtual processor 
state made by the WRPR. 


MAXPTL is the maximum value that may be written by a WRPR to TL; an attempt to 
write a larger value results in MAXPTL being written to TL. For details, see TABLE 5-22 
on page 95. 


MAXPGL is the maximum value that may be written by a WRPR to GL; an attempt to 
write a larger value results in MAxPGL being written to GL. For details, see TABLE 5-23 
on page 97. 


Exceptions. An attempt to execute a WRPR instruction in nonprivileged mode 
(PSTATE.priv = 0) causes a privileged opcode exception. 


An attempt to execute a WRPR instruction when any of the following conditions 

exist causes an /llegal instruction exception: 

m i=0 and instruction bits 12:5 are nonzero 

m rd=4 

m rd = 15, or 17-31 (reserved for future versions of the architecture) 

m O<rd<3 (attempt to write TPC, TNPC,TSTATE, or TT register) while TL = 0 
(current trap level is zero) and the virtual processor is in privileged mode. 


Implementation | In nonprivileged mode, i//legal instruction exception due to 
Note | 0 < rd < 3 and TL = 0 does not occur; the privileged opcode 
exception occurs instead. 


privileged opcode 
illegal instruction 


RDPR on page 290 
WRasr on page 358 
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7.111 


Instruction 
XOR 
XORcc 
XNOR 
XNORcc 


XOR / XNOR 


XOR Logical Operation 


op3 

00 0011 
01 0011 
00 0111 
01 0111 


Operation Assembly Language Syntax 

Exclusive or xor TESrsir reg_or_imm, 
Exclusive or and modify cc’s xorcc  TESrsir leg or imm, 
Exclusive nor xnor Tegyg1, Teg or imm, 
Exclusive nor and modify cc's xnorcc Jegy, reg or imm, 


reS rg 
reS rg 
reS rd 
reS rg 


mI IGI ——— 
= sin 


31 30 29 


Description 


Exceptions 


25 24 19 18 14 13 12 


5 4 


Class 


These instructions implement bitwise logical xor operations. They compute “R[rs1] 
op R[rs2]" if i= 0, or "R[rs1] op sign ext (simm13)" if i = 1, and write the result into 
R[rd]. 


XORcc and XNORcc modify the integer condition codes (icc and xcc). They set the 
condition codes as follows: 


iCC.V, iCC.C, XCC.V, and xcc.c are set to 0 
icc.n is copied from bit 31 of the result 
xcc.n is copied from bit 63 of the result 


icc.z is set to 1 if bits 31:0 of the result are zero (otherwise to 0) 


XCC.Z is set to 1 if all 64 bits of the result are zero (otherwise to 0) 


Programming | XNOR (and XNORcc) is identical to the xor. not (and set condition 


Note | codes) xor not cc logical operation, respectively. 


An attempt to execute an XOR, XORcc, XNOR, or XNORcc instruction when i = 0 and 
instruction bits 12:5 are nonzero causes an illegal instruction exception. 


illegal instruction 
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CHAPTER 8 


IEEE Std 754-1985 Requirements for 
UltraSPARC Architecture 2005 


The IEEE Std 754-1985 floating-point standard contains a number of implementation 
dependencies. This chapter specifies choices for these implementation dependencies, 
to ensure that SPARC V9 implementations are as consistent as possible. 


The chapter contains these major sections: 


Traps Inhibiting Results on page 365. 
Underflow Behavior on page 366. 

Integer Overflow Definition on page 367. 
Floating-Point Nonstandard Mode on page 368. 
Arithmetic Result Tables on page 368. 


Exceptions are discussed in this chapter on the assumption that instructions are 
implemented in hardware. If an instruction is implemented in software, it may not 
trigger hardware exceptions but its behavior as observed by nonprivileged software 
(other than timing) must be the same as if it was implemented in hardware. 





8.1 


Traps Inhibiting Results 


As described in Floating-Point State Register (FSR) on page 58 and elsewhere, when a 
floating-point trap occurs, the following conditions are true: 


The destination floating-point register(s) (the F registers) are unchanged. 
The floating-point condition codes (£cc0, £cc1, £cc2, and £cc3) are unchanged. 
The FSR.aexc (accrued exceptions) field is unchanged. 


The FSR.cexc (current exceptions) field is unchanged except for 
IEEE 754 exceptions; in that case, cexc contains a bit set to 1, corresponding to 
the exception that caused the trap. Only one bit shall be set in cexc. 


365 


Instructions causing an fp. exception other trap because of unfinished or 
unimplemented FPops execute as if by hardware; that is, such a trap is undetectable 
by application software, except that timing may be affected. 


Programming | A user-mode trap handler invoked for an IEEE 754 exception, 

Note | whether as a direct result of a hardware fo_exception_ieee_754 
trap or as an indirect result of privileged software handling of 
an fp exception other trap with FSR.ftt = unfinished_FPop or 
FSRftt = unimplemented_FPop, can rely on the following 
behavior: 


m The address of the instruction that caused the exception will 
be available. 


m The destination floating-point register(s) are unchanged from 
their state prior to that instruction's execution. 


m The floating-point condition codes (£cc0, £cc1, £cc2, and 
£cc3) are unchanged. 


m The FSR.aexc field is unchanged. 


m The FSR.cexc field contains exactly one bit set to 1, 
corresponding to the exception that caused the trap. 


m The FSR.ftt, FSR.qne, and reserved fields of FSR are zero. 





Dl 
An UltraSPARC Architecture virtual processor detects tininess before rounding 


occurs. (impl. dep. #55-V8-Cs10) 


TABLE 8-1 summarizes what happens when an exact unrounded value u satisfying 


0 € lul € smallest normalized number 


would round, if no trap intervened, to a rounded value r which might be zero, 
subnormal, or the smallest normalized value. 
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8.2.1 


8.2.2 


TABLE 8-1 Floating-Point Underflow Behavior (Tininess Detected Before Rounding) 








Underflow trap: |ufm = 1 ufm z 0 ufm z 0 

Inexact trap: |nxm = x nxm z 1 nxm -0 
r is minimum normal None None None 
u-r |ris subnormal UF None None 
r is zero None None None 


r is minimum normal 


u#r |ris subnormal 











ris zero 


UF = fp exception ieee 754 trap with cexc.ufc = 1 

NX = fp exception ieee 754 trap with cexc.nxc = 1 
uf = cexc.ufc = 1, aexc.ufa = 1, no fp exception ieee 754 trap 
nx = cexc.nxc = 1, aexc.nxa = 1, no fp. exception ieee 754 trap 














Trapped Underflow Definition (ufm = 1) 


Since tininess is detected before rounding, trapped underflow occurs when the exact 
unrounded result has magnitude between zero and the smallest normalized number 
in the destination format. 


Note | The wrapped exponent results intended to be delivered on 
trapped underflows and overflows in IEEE 754 are irrelevant to 
the UltraSPARC Architecture at the hardware, and privileged 
software levels. If they are created at all, it would be by user 
software in a nonprivileged-mode trap handler. 


Untrapped Underflow Definition (ufm = 0) 


Untrapped underflow occurs when the exact unrounded result has magnitude 
between zero and the smallest normalized number in the destination format and the 
correctly rounded result in the destination format is inexact. 





6.3 


Integer Overflow Definition 


m F<sdq>TOi — When a NaN, infinity, large positive argument > 21 or large 
negative argument € (9 * 1) is converted to an integer, the invalid current 
(nvc) bit of FSR.cexc is set to 1, and if the floating-point invalid trap is enabled 
(FSR.tem.nvm = 1), the fp exception IEEE 754 exception is raised. If the 
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floating-point invalid trap is disabled (FSR.tem.nvm = 0), no trap occurs and a 
numerical result is generated: if the sign bit of the operand is 0, the result is 23t — 
1; if the sign bit of the operand is 1, the result is — I 


= F<sdq>TOx — When a NaN, infinity, large positive argument > 2°, or large 


negative argument < -(29? + 1) is converted to an extended integer, the 

invalid. current (nvc) bit of FSR.cexc is set to 1, and if the floating-point invalid 
trap is enabled (FSR.tem.nvm - 1), the fp exception IEEE 754 exception is 
raised. If the floating-point invalid trap is disabled (FSR.tem.nvm = 0), no trap 
occurs and a numerical result is generated: if the sign bit of the operand is 0, the 
result is 2° — 1; if the sign bit of the operand is 1, the result is — E 





8.4 


Floating-Point Nonstandard Mode 


On an UltraSPARC Architecture 2005 processor, all floating-point operations 
produce results that conform to IEEE Std. 754, regardless of the setting of the 
“nonstandard mode” bit, FSR.ns (impl. dep. #18-V8) 





6.5 


Arithmetic Result Tables 


This section contains detailed tables, showing the results produced by various 

floating-point operations, depending on their source operands. 

Notes on source types: 

m Nn is a number in F[rsr], which may be normal or subnormal. 

m QNaNn and SNaNn are Quiet and Signaling Not-a-Number values in F[rsr], 
respectively. 

Notes on result types: 


m R: (rounded) result of operation, which may be normal, subnormal, zero, or 
infinity. May also cause OF, UF, NX, unfinished. 


m dQNaN is the generated default Quiet NaN (sign = 0, exponent = all 1s, 
fraction = all 1s). The sign of the default Quiet NaN is zero to distinguish it from 
storage initialized to all ones. 


m OSNaNn is the Signalling NaN operand from F[rsn] with the Quiet bit asserted 
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8.5.1 Floating-Point Add (FADD) 


TABLE 8-2 Floating-Point Add operation (F[rs1] + F[rs2]) 


F[rs2] 


Ran E2229 ER E 3 ECC 














ONaN2 
QSNaN2, 
= NV 
QSNaN1, 
NV 


* if N1 =-N2, then ** 


+ 


result is +0 unless rounding mode is round to —°°, in which case the result is —0 


For the FADD instructions, R may be any number; its generation may cause OF, UF, 
and/or NX. 


Floating-point add is not commutative when both operands are NaN. 
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8.5.2 Floating-Point Subtract (FSUB) 


TABLE 8-3 Floating-Point Subtract operation (F[rs1] — F[rs2]) 


F[rs2] 








QSNaN2, 


F[rs1] NV 

















QNaN1 








QSNaNI, 
NV 











* if N1 = N2, then ** 


** result is +0 unless rounding mode is round to —co, in which case the result is —0 


For the FSUB instructions, R may be any number; its generation may cause OF, UF, 
and/or NX. 


Note that -x # 0—-x when x is zero or NaN. 


8.5.3 Floating-Point Multiply 


TABLE 8-4 Floating-Point Multiply operation (F[rs1] x F[rs2]) 
| Firs2] 


















QNaN2 








QSNaN2, 
+R NV 








F[rs1] 
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R may be any number; its generation may cause OF, UF, and/or NX. 
Floating-point multiply is not commutative when both operands are NaN. 
FsMULd (FdMULq) never causes OF, UF, or NX. 


A NaN input operand to FFMULd (FdMULq) must be widened to produce a double- 
precision (quad-precision) NaN output, by filling the least-significant bits of the 
NaN result with zeros. 


8.5.4 Floating-Point Divide (FDIV) 


TABLE 8-5 Floating-Point Divide operation (F[rs1] + F[rs2]) 


| F[rs2] 








—oo -N2 —0 +0 + N2 +00 SNaN2 






































R may be any number; its generation may cause OF, UF, and/or NX. 


8.5.5 Floating-Point Square Root (FSORT) 


TABLE 8-6 Floating-Point Square Root operation (./F[rs2] ) 


F[rs2] 


e| m| o| o | ne | em | ananz | sur | 





dQNaN, +R QNaN2 | QSNaN2, 
NV NV 
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R may be any number; its generation may cause NX. 


Square root cannot cause DZ, OF, or UF. 


8.5.6 Floating-Point Compare (FCMP, FCMPE) 


TABLE 8-7 Floating-Point Compare (FCMP, FCMPE) operation (F[rs1] ? F[rs2]) 








First], +1 


+00 


QNaN1 








SNaN1 








* NV for FCMPE, but not for FCMP. 


TABLE 8-8 FSR.fcc Encoding for Result of FCMP, FCMPE 


fcc result meaning 
0 = 
1 < 
2 > 
3 unordered 


NaN is considered to be unequal to anything else, even the identical NaN bit 
pattern. 


FCMP/FCMPE cannot cause DZ, OF, UF, NX. 
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8.5.7 Floating-Point to Floating-Point Conversions 
(F«sId Iq» TOcsldlq») 


TABLE 8-9 Floating-Point to Float-Point Conversions (convert(F[rs2])) 




















F[rs2] 

-SNaN2 | -QNaN2 —oo -N2 -0 +0 +N2 400 +QNaN2 | «SNaN2 
-QSNaN2,|-QNaN2| ~œ R 0 +0 +R œo |-QONaN2 | +QSNaN?, 
NV NV 

For FsTOd: 


m the least-significant fraction bits of a normal number are filled with zero to fit in 
double-precision format 

m the least-significant bits of a NaN result operand are filled with zero to fit in 
double-precision format 


For FsTOq and FdTOq: 


m the least-significant fraction bits of a normal number are filled with zero to fit in 
quad-precision format 

m the least-significant bits of a NaN result operand are filled with zero to fit in 
quad-precision format 


For FqTOs and FdTOs: 


m the fraction is rounded according to the current rounding mode 

m the lower-order bits of a NaN source are discarded to fit in single-precision 
format; this discarding is not considered a rounding operation, and will not cause 
an NX exception 


For FqTOd: 


m the fraction is rounded according to the current rounding mode 

m the least-significant bits of a NaN source are discarded to fit in double-precision 
format; this discarding is not considered a rounding operation, and will not cause 
an NX exception 
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TABLE 8-10 Floating-Point to Float-Point Conversion Exception Conditions 


NV | ¢ SNaN operand 


OF | ¢ FdTOs, FqTOs: the input is larger than can be expressed in single precision 
* FqTOd: the input is larger than can be expressed in double precision 
* does not occur during other conversion operations 








UF | ¢ FdTOs, FqTOs: the input is smaller than can be expressed in single precision 
* FqTOd: the input is smaller than can be expressed in double precision 
* does not occur during other conversion operations 





NX |° FdTOs, FqTOs: the input fraction has more significant bits than can be held in a 
single precision fraction 
* FqTOd: the input fraction has more significant bits than can be held in a double 
precision fraction 
* does not occur during other conversion operations 


8.5.8 Floating-Point to Integer Conversions 
(F<s|d1q>TO<il x>) 


TABLE 8-11 Floating-Point to Integer Conversions (convert(F[rs2])) 


F[rs2] 











R may be any integer, and may cause NV, NX. 
Float-to-Integer conversions are always treated as round-toward-zero (truncated). 


These operations are invalid (due to integer overflow) under the conditions 
described in Integer Overflow Definition on page 367. 


TABLE 8-12 Floating-point to Integer Conversion Exception Conditions 


NV | ¢ SNaN operand 
QNaN operand 
too operand 
integer overflow 





NX | * non-integer source (truncation occurred) 
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8.5.9 Integer to Floating-Point Conversions 
(F<ilx>TO<s1d1q>) 


TABLE 8-13 Integer to Floating-Point Conversions 
(convert(F[rs2])) 


| F[rs2] 





-int 0 +int 


-R +0 +R 














R may be any number; its generation may cause NX. 


TABLE 8-14 Floating-Point Conversion Exception Conditions 





NX | ¢ FxTOd, FxTOs, FiTOs (possible loss of precision) 
* not applicable to FiTOd, FxTOq, or FiTOq (FSR.cexc will 
always be cleared) 
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CHAPTER 9 


Memory 





The UltraSPARC Architecture memory models define the semantics of memory 
operations. The instruction set semantics require that loads and stores behave as if 
they are performed in the order in which they appear in the dynamic control flow of 
the program. The actual order in which they are processed by the memory may be 
different. The purpose of the memory models is to specify what constraints, if any, 
are placed on the order of memory operations. 


The memory models apply both to uniprocessor and to shared memory 
multiprocessors. Formal memory models are necessary for precise definitions of the 
interactions between multiple virtual processors and input/output devices in a 
shared memory configuration. Programming shared memory multiprocessors 
requires a detailed understanding of the operative memory model and the ability to 
specify memory operations at a low level in order to build programs that can safely 
and reliably coordinate their activities. For additional information on the use of the 
models in programming real systems, see Programming with the Memory Models, 
contained in the separate volume UltraSPARC Architecture Application Notes. 


This chapter contains a great deal of theoretical information so that the discussion of 
the UltraSPARC Architecture TSO memory model has sufficient background. 


This chapter describes memory models in these sections: 


Memory Location Identification on page 378. 

Memory Accesses and Cacheability on page 378. 

Memory Addressing and Alternate Address Spaces on page 381. 
SPARC V9 Memory Model on page 384. 

The UltraSPARC Architecture Memory Model — TSO on page 388. 
Nonfaulting Load on page 396. 

Store Coalescing on page 397. 
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9.1 


Memory Location Identification 


A memory location is identified by an 8-bit address space identifier (ASI) and a 64- 
bit memory address. The 8-bit ASI can be obtained from an ASI register or included 
in a memory access instruction. The ASI used for an access can distinguish among 
different 64-bit address spaces, such as Primary memory space, Secondary memory 
space, and internal control registers. It can also apply attributes to the access, such as 
whether the access should be performed in big- or little-endian byte order, or 
whether the address should be taken as a virtual or real. 





9.2 


9.2.1 


Memory Accesses and Cacheability 


Memory is logically divided into real memory (cached) and 1/0 memory 
(noncached with and without side effects) spaces. 


Real memory stores information without side effects. A load operation returns the 
value most recently stored. Operations are side-effect-free in the sense that a load, 
store, or atomic load-store to a location in real memory has no program-observable 
effect, except upon that location (or, in the case of a load or load-store, on the 
destination register). 


I/O locations may not behave like memory and may have side effects. Load, store, 
and atomic load-store operations performed on I/O locations may have observable 
side effects, and loads may not return the value most recently stored. The value 
semantics of operations on I/O locations are not defined by the memory models, but 
the constraints on the order in which operations are performed is the same as it 
would be if the I/O locations were real memory. The storage properties, contents, 
semantics, ASI assignments, and addresses of I/O registers are implementation 
dependent. 


Coherence Domains 


Two types of memory operations are supported in the UltraSPARC Architecture: 
cacheable and noncacheable accesses. The manner in which addresses are 
differentiated is implementation dependent. In some implementations, it is indicated 
in the page translation entry (TTE.cp). 


Although SPARC V9 does not specify memory ordering between cacheable and 
noncacheable accesses, the UltraSPARC Architecture maintains TSO ordering 
between memory references regardless of their cacheability. 
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The UltraSPARC Architecture obeys the Sun-5 Ordering rules as documented in the 
“Sun-4u/Sun-5 Ordering with TSO” specification. 


9.2.1.1 Cacheable Accesses 


Accesses within the coherence domain are called cacheable accesses. They have these 
properties: 

m Data reside in real memory locations. 

m Accesses observe supported cache coherency protocol(s). 

m The cache line size is 2" bytes (where n 2 4), and can be different for each cache. 


9.2.1.2. Noncacheable Accesses 


Noncacheable accesses are outside of the coherence domain. They have the 
following properties: 


m Data might not reside in real memory locations. Accesses may result in 
programmer-visible side effects. An example is memory-mapped I/O control 
registers. 

m Accesses do not observe supported cache coherency protocol(s). 

m The smallest unit in each transaction is a single byte. 


The UItraSPARC Architecture MMU optionally includes an attribute bit in each page 
translation, TTE.e, which when set signifies that this page has side effects. 


Noncacheable accesses without side effects (TTE.e = 0) are processor-consistent and 
obey TSO memory ordering. In particular, processor consistency ensures that a 
noncacheable load that references the same location as a previous noncacheable store 
will load the data from the previous store. 


Noncacheable accesses with side effects (TTE.e = 1) are processor consistent and are 
strongly ordered. These accesses are described in more detail in the following 
section. 


92.13 Noncacheable Accesses with Side-Effect 


Loads, stores, and load-stores to I/O locations might not behave with memory 
semantics. Loads and stores could have side effects; for example, a read access could 
clear a register or pop an entry off a FIFO. A write access could set a register address 
port so that the next access to that address will read or write a particular internal 
register. Such devices are considered order sensitive. Also, such devices may only 
allow accesses of a fixed size, so store merging of adjacent stores or stores within a 
16-byte region would cause an error (see Store Coalescing on page 397). 


Noncacheable accesses (other than block loads and block stores) to pages with side 
effects (TTE.e = 1) exhibit the following behavior: 


CHAPTER 9 * Memory 379 


m Noncacheable accesses are strongly ordered with respect to each other. Bus 
protocol should guarantee that IO transactions to the same device are delivered in 
the order that they are received. 


mw Noncacheable loads with the TTE.e bit = 1 will not be issued to the system until 
all previous instructions have completed, and the store queue is empty. 


m Noncacheable store coalescing is disabled for accesses with TTE.e = 1. 
m A MEMBAR may be needed between side-effect and non-side-effect accesses. See 
TABLE 9-3 on page 394. 


Whether block loads and block stores adhere to the above behavior or ignore TTE.e 
and always behave as if TTE.e = 0 is implementation-dependent (impl. dep. #410- 
510, #411-S10). 


On UltraSPARC Architecture virtual processors, noncacheable and side-effect 
accesses do not observe supported cache coherency protocols (impl. dep. #120). 


Non-faulting loads (using ASI PRIMARY NO FAULT[ LITTLE]or 
ASI SECONDARY NO FAULT[ LITTLE]) with the TTE.e bit = 1 cause a trap. 











Prefetches to noncacheable addresses result in nops. 


The processor does speculative instruction memory accesses and follows branches 
that it predicts are taken. Instruction addresses mapped by the MMU can be 
accessed even though they are not actually executed by the program. Normally, 
locations with side effects or that generate timeouts or bus errors are not mapped as 
instruction addresses by the MMU, so these speculative accesses will not cause 
problems. 


IMPL. DEP. #118-V9: The manner in which I/O locations are identified is 
implementation dependent. 


IMPL. DEP. #120-V9: The coherence and atomicity of memory operations between 
virtual processors and I/O DMA memory accesses are implementation dependent. 


V9 Compatibility | Operations to I/O locations are not guaranteed to be 
Note | sequentially consistent among themselves, as they are in SPARC 
V8. 


Systems supporting SPARC V8 applications that use memory-mapped I/O locations 
must ensure that SPARC V8 sequential consistency of I/O locations can be 
maintained when those locations are referenced by a SPARC V8 application. The 
MMU either must enforce such consistency or cooperate with system software or the 
virtual processor to provide it. 


IMPL. DEP. #121-V9: An implementation may choose to identify certain addresses 
and use an implementation-dependent memory model for references to them. 
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9.3 


9.3.1 


Memory Addressing and Alternate 
Address Spaces 


An address in SPARC V9 is a tuple consisting of an 8-bit address space identifier 
(ASI) and a 64-bit byte-address offset within the specified address space. Memory is 
byte-addressed, with halfword accesses aligned on 2-byte boundaries, word accesses 
(which include instruction fetches) aligned on 4-byte boundaries, extended-word 
and doubleword accesses aligned on 8-byte boundaries, and quadword quantities 
aligned on 16-byte boundaries. With the possible exception of the cases described in 
Memory Alignment Restrictions on page 102, an improperly aligned address in a load, 
store, or load-store instruction always causes a trap to occur. The largest datum that 
is guaranteed to be atomically read or written is an aligned doubleword!. Also, 
memory references to different bytes, halfwords, and words in a given doubleword 
are treated for ordering purposes as references to the same location. Thus, the unit of 
ordering for memory is a doubleword. 


Notes | The doubleword is the coherency unit for update, but 
programmers should not assume that doubleword floating-point 
values are updated as a unit unless they are doubleword-aligned 
and always updated with double-precision loads and stores. 
Some programs use pairs of single-precision operations to load 
and store double-precision floating-point values when the 
compiler cannot determine that they are doubleword aligned. 


Also, although quad-precision operations are defined in the 
SPARC V9 architecture, the granularity of loads and stores for 
quad-precision floating-point values may be word or 
doubleword. 





Memory Addressing Types 


The UltraSPARC Architecture supports the following types of memory addressing: 


Virtual Addresses (VA). Virtual addresses are addresses produced by a virtual 
processor that maps all systemwide, program-visible memory. Virtual addresses can 
be presented in nonprivileged mode and privileged mode 


1- Two exceptions to this are the special AST_TWIN_DW_NUCLEUS [_L] and ASI_TWINX_REAL[_L] which 
provide hardware support for an atomic quad load to be used for TTE loads from TSBs. 
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9.3.3 


Real addresses (RA). A real address is provided to privileged software to 
describe the underlying physical memory allocated to it. Translation storage buffers 
(TSBs) maintained by privileged software are used to translate privileged or 
nonprivileged mode virtual addresses into real addresses. MMU bypass addresses in 
privileged mode are also real addresses. 


Nonprivileged software only uses virtual addresses. Privileged software uses virtual 
and real addresses. 


Memory Address Spaces 


The UltraSPARC Architecture supports accessing memory using virtual or real 
addresses. Multiple virtual address spaces within the same real address space are 
distinguished by a context identifier (context ID). 


Privileged software can create multiple virtual address spaces, using the primary 
and secondary context registers to associate a context ID with every virtual address. 
Privileged software manages the allocation of context IDs. 


The full representation of a real address is as follows: 


real_address = context_ID :: virtual_address 


Address Space Identifiers 


The virtual processor provides an address space identifier with every address. This 
ASI may serve several purposes: 


m To identify which of several distinguished address spaces the 64-bit address offset 
is addressing 


m To provide additional access control and attribute information, for example, to 
specify the endianness of the reference 


m To specify the address of an internal control register in the virtual processor, 
cache, or memory management hardware 


Memory management hardware can associate an independent 2°'-byte memory 


address space with each ASI. In practice, the three independent memory address 
spaces (contexts) created by the MMU are Primary, Secondary, and Nucleus. 


Programming | Independent address spaces, accessible through ASIs, make it 
Note | possible for system software to easily access the address space of 

faulting software when processing exceptions or to implement 
access to a client program's memory space by a server program. 


Alternate-space load, store, load-store and prefetch instructions specify an explicit 
ASI to use for their data access. The behavior of the access depends on the current 
privilege mode. 
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Non-alternate space load, store, load-store, and prefetch instructions use an implicit 
ASI value that is determined by current virtual processor state (the current privilege 
mode, trap level (TL), and the value of the PSTATE.cle). Instruction fetches use an 
implicit ASI that depends only on the current mode and trap level. 


The architecturally specified ASIs are listed in Chapter 10, Address Space Identifiers 
(ASIs). The operation of each ASI in nonprivileged and privileged modes is 
indicated in TABLE 10-1 on page 401. 


Attempts by nonprivileged software (PSTATE.priv = 0) to access restricted ASIs (ASI 
bit 7 = 0) cause a privileged_action exception. Attempts by privileged software 
(PSTATE.priv = 1) to access ASIs 3016-7F 16 cause a privileged action exception. 


When TL = 0, normal accesses by the virtual processor to memory when fetching 
instructions and performing loads and stores implicitly specify A81 PRIMARY or 
ASI PRIMARY LITTLE, depending on the setting of PSTATE.cle. 





When TL = 1 or 2 (> 0 but € MAXPTL), the implicit ASI in privileged mode is: 
m for instruction fetches, ASI NUCLEUS 




















m for loads and stores, ASI. NUCLEUS if PSTATE.cle = 0 or ASI. NUCLEUS. LITTLE 
if PSTATE.cle = 1 (impl. dep. #124-V9). 





SPARC V9 supports the PRIMARY[ LITTLE], SECONDARY[ LITTLE], and 
NUCLEUS[ LITTLE] address spaces. 























Accesses to other address spaces use the load /store alternate instructions. For these 
accesses, the ASI is either contained in the instruction (for the register+register 
addressing mode) or taken from the ASI register (for register-immediate 
addressing). 


ASIs are either nonrestricted or restricted-to-privileged: 


m À nonrestricted ASI (ASI range 8016 — FF46) is one that may be used 
independently of the privilege level (PSTATE.priv) at which the virtual processor 
is running. 

m A restricted-to-privileged ASI (ASI range 0046 — 2F46) requires that the virtual 
processor be in privileged mode for a legal access to occur. 
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The relationship between virtual processor state and ASI restriction is shown in 
TABLE 9-1. 


TABLE 9-1 Allowed Accesses to ASIs 





Result of ASI Result of ASI 
ASI Value Type Access in NP Mode Access in P Mode 
0016 — Restricted-to- privileged_action Valid Access 
2F16 privileged exception 
8016 — Nonrestricted Valid Access Valid Access 


FFi6 





Some restricted ASIs are provided as mandated by SPARC V9: 
ASI_AS_IF_USER_PRIMARY[_LITTLE] and 

ASI AS IF USER SECONDARY[ LITTLE]. The intent of these ASIs is to give 
privileged software efficient, yet secure access to the memory space of nonprivileged 
software. 




















The normal address space is primary address space, which is accessed by the 
unrestricted ASI PRIMARY[ LITTLE] ASIs. The secondary address space, which is 
accessed by the unrestricted ASI SECONDARY[ LITTLE] ASIs, is provided to allow 
server software to access client software's address space. 




















ASI PRIMARY NOFAULT[ LITTLE]and ASI SECONDARY NOFAULT[ LITTLE] 
support nonfaulting loads. These ASIs may be used to color (that is, distinguish into 
classes) loads in the instruction stream so that, in combination with a judicious 
mapping of low memory and a specialized trap handler, an optimizing compiler can 
move loads outside of conditional control structures. 





9.4 SPARC V9 Memory Model 


The SPARC V9 processor architecture specified the organization and structure of a 
central processing unit but did not specify a memory system architecture. This 
section summarizes the MMU support required by an UltraSPARC Architecture 
processor. 


The memory models specify the possible order relationships between memory- 
reference instructions issued by a virtual processor and the order and visibility of 
those instructions as seen by other virtual processors. The memory model is 
intimately intertwined with the program execution model for instructions. 
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9.4.1 


SPARC V9 Program Execution Model 


The SPARC V9 strand model of a virtual processor consists of three units: an Issue 
Unit, a Reorder Unit, and an Execute Unit, as shown in FIGURE 9-1. 


Processor 


Data Path 


Reorder Execute 


Unit Unit Instruction Path 











FIGURE 9-1 Processor Model: Uniprocessor System 


The Issue Unit reads instructions over the instruction path from memory and issues 
them in program order to the Reorder Unit. Program order is precisely the order 
determined by the control flow of the program and the instruction semantics, under 
the assumption that each instruction is performed independently and sequentially. 


Issued instructions are collected and potentially reordered in the Reorder Unit, and 
then dispatched to the Execute Unit. Instruction reordering allows an 
implementation to perform some operations in parallel and to better allocate 
resources. The reordering of instructions is constrained to ensure that the results of 
program execution are the same as they would be if the instructions were performed 
in program order. This property is called processor self-consistency. 


Processor self-consistency requires that the result of execution, in the absence of any 
shared memory interaction with another virtual processor, be identical to the result 
that would be observed if the instructions were performed in program order. In the 
model in FIGURE 9-1, instructions are issued in program order and placed in the 
reorder buffer. The virtual processor is allowed to reorder instructions, provided it 
does not violate any of the data-flow constraints for registers or for memory. 


The data-flow order constraints for register reference instructions are these: 


1. An instruction that reads from or writes to a register cannot be performed until all 
earlier instructions that write to that register have been performed (read-after- 
write hazard; write-after-write hazard). 
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2. An instruction cannot be performed that writes to a register until all earlier 
instructions that read that register have been performed (write-after-read hazard). 


V9 Compatibility | An implementation can avoid blocking instruction execution in 
Note | case 2 and the write-after-write hazard in case 1 by using a 
renaming mechanism that provides the old value of the register 
to earlier instructions and the new value to later uses. 


The data-flow order constraints for memory-reference instructions are those for 
register reference instructions, plus the following additional constraints: 


1. A memory-reference instruction that uses (loads or stores) the value at a location 
cannot be performed until all earlier memory-reference instructions that set (store 
to) that location have been performed (read-after-write hazard, write-after-write 
hazard). 


2. A memory-reference instruction that writes (stores to) a location cannot be 
performed until all previous instructions that read (load from) that location have 
been performed (write-after-read hazard). 


Memory-barrier instruction (MEMBAR) and the TSO memory model also constrain 
the issue of memory-reference instructions. See Memory Ordering and Synchronization 
on page 393 and The UltraSPARC Architecture Memory Model — TSO on page 388 for 
a detailed description. 


The constraints on instruction execution assert a partial ordering on the instructions 
in the reorder buffer. Every one of the several possible orderings is a legal execution 
ordering for the program. See Appendix D, Formal Specification of the Memory Models, 
for more information. 
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9.4.2 


Virtual Processor/Memory Interface Model 


Each UltraSPARC Architecture virtual processor in a multiprocessor system is 
modeled as shown in FIGURE 9-2; that is, having two independent paths to memory: 
one for instructions and one for data. 

Memory Transactions 


Virtual Processors in Memory Order 


Instructions 
CEH ata 

Instructions 
[ J | Date 


FIGURE 9-2 Data Memory Paths: Multiprocessor System 


























Instructions 
Data 








Data caches are maintained by hardware so their contents always appear to be 
consistent (coherent). Instruction caches are not required to be kept consistent with 
data caches and therefore require explicit program (software) action to ensure 
consistency when a program modifies an executing instruction stream. See 
Synchronizing Instruction and Data Memory on page 395 for details. Memory is shared 
in terms of address space, but it may be nonhomogeneous and distributed in an 
implementation.Caches are ignored in the model, since their functions are 
transparent to the memory modell. 


In real systems, addresses may have attributes that the virtual processor must 
respect. The virtual processor executes loads, stores, and atomic load-stores in 
whatever order it chooses, as constrained by program order and the memory model. 


Instructions are performed in an order constrained by local dependencies. Using this 
dependency ordering, an execution unit submits one or more pending memory 
transactions to the memory. The memory performs transactions in memory order. The 
memory unit may perform transactions submitted to it out of order; hence, the 
execution unit must not concurrently submit two or more transactions that are 
required to be ordered, unless the memory unit can still guarantee in-order 
semantics. 


The memory accepts transactions, performs them, and then acknowledges their 
completion. Multiple memory operations may be in progress at any time and may be 


initiated in a nondeterministic fashion in any order, provided that all transactions to 
1- The model described here is only a model; implementations of UltraSPARC Architecture systems are 
unconstrained as long as their observable behaviors match those of the model. 
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a location preserve the per-virtual processor partial orderings. Memory transactions 
may complete in any order. Once initiated, all memory operations are performed 
atomically: loads from one location all see the same value, and the result of stores is 
visible to all potential requestors at the same instant. 


The order of memory operations observed at a single location is a total order that 
preserves the partial orderings of each virtual processor’s transactions to this 
address. There may be many legal total orders for a given program’s execution. 





9.5 


The UltraSPARC Architecture Memory 
Model — TSO 


The UltraSPARC Architecture is a model that specifies the behavior observable by 
software on UltraSPARC Architecture systems. Therefore, access to memory can be 
implemented in any manner, as long as the behavior observed by software conforms 
to that of the models described here. 


The SPARC V? architecture defines three different memory models: Total Store Order 
(TSO), Partial Store Order (PSO), and Relaxed Memory Order (RMO). 


All SPARC V9 processors must provide Total Store Order (or a more strongly 
ordered model, for example, Sequential Consistency) to ensure compatibility for 
SPARC V8 application software. 


All UltraSPARC Architecture virtual processors implement TSO ordering. The PSO 
and RMO models from SPARC V9 are not described in this UltraSPARC Architecture 
specification. UltraSPARC Architecture 2005 processors do not implement the PSO 
memory model directly, but all software written to run under PSO will execute 
correctly on an UltraSPARC Architecture 2005 processor (using the TSO model). 


Whether memory models represented by PSTATE.mm = 10; or 11, are supported in 
an UltraSPARC Architecture processor is implementation dependent (impl. dep. 
#113-V9-Ms10). If the 10, model is supported, then when PSTATE.mm = 10; the 
implementation must correctly execute software that adheres to the RMO model 
described in The SPARC Architecture Manual-Version 9. If the 11, model is supported, 
its definition is implementation dependent and will be described in implementation- 
specific documentation. 


Programs written for Relaxed Memory Order will work in both Partial Store Order 
and Total Store Order. Programs written for Partial Store Order will work in Total 

Store Order. Programs written for a weak model, such as RMO, may execute more 
quickly when run on hardware directly supporting that model, since the model 
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9.5.1 


9.5.2 


exposes more scheduling opportunities, but use of that model may also require extra 
instructions to ensure synchronization. Multiprocessor programs written for a 
stronger model will behave unpredictably if run in a weaker model. 


Machines that implement sequential consistency (also called "strong ordering" or 
"strong consistency") automatically support programs written for TSO. Sequential 
consistency is not a SPARC V9 memory model. In sequential consistency, the loads, 
stores, and atomic load-stores of all virtual processors are performed by memory in 
a serial order that conforms to the order in which these instructions are issued by 
individual virtual processors. A machine that implements sequential consistency 
may deliver lower performance than an equivalent machine that implements TSO 
order. Although particular SPARC V9 implementations may support sequential 
consistency, portable software must not rely on the sequential consistency memory 
model. 


Memory Model Selection 


The active memory model is specified by the 2-bit value in PSTATE.mm,. The value 
00; represents the TSO memory model; increasing values of PSTATE.mm indicate 
increasingly weaker (less strongly ordered) memory models. 


Writing a new value into PSTATE.mm causes subsequent memory reference 
instructions to be performed with the order constraints of the specified memory 
model. 


IMPL. DEP. #119-Ms10: The effect of an attempt to write an unsupported memory 
model designation into PSTATE.mm is implementation dependent; however, it 
should never result in a value of PSTATE.mm value greater than the one that was 
written. In the case of an UltraSPARC Architecture implementation that only 
supports the TSO memory model, PSTATE.mm always reads as zero and attempts to 
write to it are ignored. 


Programmer-Visible Properties of the UltraSPARC 
Architecture TSO Model 


Total Store Order must be provided for compatibility with existing SPARC V8 
programs. Programs that execute correctly in either RMO or PSO will execute 
correctly in the TSO model. 


The rules for TSO, in addition to those required for self-consistency (see page 385), 
are: 

m Loads are blocking and ordered with respect to earlier loads 

m Stores are ordered with respect to stores. 


m Atomic load-stores are ordered with respect to loads and stores. 
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m Stores cannot bypass earlier loads. 


Programming | Loads can bypass earlier stores to other addresses, which 
Note | maintains processor self-consistency. 


Atomic load-stores are treated as both a load and a store and can only be applied to 
cacheable address spaces. 


Thus, TSO ensures the following behavior: 


m Each load instruction behaves as if it were followed by a MEMBAR #LoadLoad 
and #LoadStore. 


m Each store instruction behaves as if it were followed by a MEMBAR 
#StoreStore. 


m Each atomic load-store behaves as if it were followed by a MEMBAR #LoadLoad, 
#LoadStore, and #StoreStore. 


In addition to the above TSO rules, the following rules apply to UltraSPARC 
Architecture memory models: 


m A MEMBAR #StoreLoad must be used to prevent a load from bypassing a prior 
store, if Strong Sequential Order (as defined in The UltraSPARC Architecture 
Memory Model — TSO on page 388) is desired. 


m Accesses that have side effects are all strongly ordered with respect to each other. 


m A MEMBAR #Lookaside is not needed between a store and a subsequent load to 
the same noncacheable address. 


m Load (LDXA) and store (STXA) instructions that reference certain internal ASIs 
perform both an intra-virtual processor synchronization (i.e. an implicit 
MEMBAR #Sync operation before the load or store is executed) and an inter- 
virtual processor synchronization (that is, all active virtual processors are brought 
to a point where synchronization is possible, the load or store is executed, and all 
virtual processors then resume instruction fetch and execution). The model- 
specific PRM should indicate which ASIs require intra-virtual processor 
synchronization, inter-virtual processor synchronization, or both. 


TSO Ordering Rules 


TABLE 9-2 summarizes the cases where a MEMBAR must be inserted between two 
memory operations on an UltraSPARC Architecture virtual processor running in 
TSO mode, to ensure that the operations appear to complete in a particular order. 
Memory operation ordering is not to be confused with processor consistency or 
deterministic operation; MEMBARs are required for deterministic operation of 
certain ASI register updates. 


Programming | To ensure software portability across systems, the MEMBAR 
Note | rules in this section should be followed (which may be stronger 
than the rules in SPARC V9). 
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9.5.4 


TABLE 9-2 is to be read as follows: Reading from row to column, the first memory 
operation in program order in a row is followed by the memory operation found in 
the column. Symbols used as table entries: 


m #— No intervening operation is required. 


wm M — an intervening MEMBAR #StoreLoad or MEMBAR #Sync or 
MEMBAR #MemIssue is required 


m S— an intervening MEMBAR #Sync or MEMBAR #MemIssue is required 
m nc — Noncacheable 
m e — Side effect 


m ne — No side effect 


TABLE 9-2 Summary of UltraSPARC Architecture Ordering Rules (TSO Memory Model) 
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1. This table assumes that both noncacheable operations access the same device. 


2. When the store and subsequent load access the same location, no intervening MEMBAR is required. 


Hardware Primitives for Mutual Exclusion 


In addition to providing memory-ordering primitives that allow programmers to 
construct mutual-exclusion mechanisms in software, the UltraSPARC Architecture 
provides three hardware primitives for mutual exclusion: 


m Compare and Swap (CASA and CASXA) 
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m Load Store Unsigned Byte (LDSTUB and LDSTUBA) 
m Swap (SWAP and SWAPA) 


Each of these instructions has the semantics of both a load and a store in all three 
memory models. They are all atomic, in the sense that no other store to the same 
location can be performed between the load and store elements of the instruction. 
All of the hardware mutual-exclusion operations conform to the TSO memory model 
and may require barrier instructions to ensure proper data visibility. 


Atomic load-store instructions can be used only in the cacheable domains (not in 
noncacheable I/O addresses). An attempt to use an atomic load-store instruction to 
access a noncacheable page results in a data_access_exception exception. 


The atomic load-store alternate instructions can use a limited set of the ASIs. See the 
specific instruction descriptions for a list of the valid ASIs. An attempt to execute an 
atomic load-store alternate instruction with an invalid ASI results in a 
data_access_exception exception. 


9.5.4.1 Compare-and-Swap (CASA, CASXA) 


Compare-and-swap is an atomic operation that compares a value in a virtual 
processor register to a value in memory and, if and only if they are equal, swaps the 
value in memory with the value in a second virtual processor register. Both 32-bit 
(CASA) and 64-bit (CASXA) operations are provided. The compare-and-swap 
operation is atomic in the sense that once it begins, no other virtual processor can 
access the memory location specified until the compare has completed and the swap 
(if any) has also completed and is potentially visible to all other virtual processors in 
the system. 


Compare-and-swap is substantially more powerful than the other hardware 
synchronization primitives. It has an infinite consensus number; that is, it can 
resolve, in a wait-free fashion, an infinite number of contending processes. Because 
of this property, compare-and-swap can be used to construct wait-free algorithms 
that do not require the use of locks. For examples, see Programming with the Memory 
Models, contained in the separate volume UltraSPARC Architecture Application Notes. 


9.5.4. Swap (SWAP) 


SWAP atomically exchanges the lower 32 bits in a virtual processor register with a 
word in memory. SWAP has a consensus number of two; that is, it cannot resolve 
more than two contending processes in a wait-free fashion. 
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9.5.4.3 Load Store Unsigned Byte (LDSTUB) 


LDSTUB loads a byte value from memory to a register and writes the value FF;, into 
the addressed byte atomically. LDSTUB is the classic test-and-set instruction. Like 
SWAP, it has a consensus number of two and so cannot resolve more than two 
contending processes in a wait-free fashion. 


Memory Ordering and Synchronization 


The UltraSPARC Architecture provides some level of programmer control over 
memory ordering and synchronization through the MEMBAR and FLUSH 
instructions. 


MEMBAR serves two distinct functions in SPARC V9. One variant of the MEMBAR, 
the ordering MEMBAR, provides a way for the programmer to control the order of 
loads and stores issued by a virtual processor. The other variant of MEMBAR, the 
sequencing MEMBAR, enables the programmer to explicitly control order and 
completion for memory operations. Sequencing MEMBARs are needed only when a 
program requires that the effect of an operation becomes globally visible rather than 
simply being scheduled.! Because both forms are bit-encoded into the instruction, a 
single MEMBAR can function both as an ordering MEMBAR and as a sequencing 
MEMBAR. 


The SPARC V9 instruction set architecture does not guarantee consistency between 
instruction and data spaces. A problem arises when instruction space is dynamically 
modified by a program writing to memory locations containing instructions (Self- 
Modifying Code). Examples are Lisp, debuggers, and dynamic linking. The FLUSH 
instruction synchronizes instruction and data memory after instruction space has 
been modified. 


9.5.5.1 Ordering MEMBAR Instructions 


Ordering MEMBAR instructions induce an ordering in the instruction stream of a 
single virtual processor. Sets of loads and stores that appear before the MEMBAR in 
program order are ordered with respect to sets of loads and stores that follow the 
MEMBAR in program order. Atomic operations (LDSTUB(A), SWAP(A), CASA, and 
CASXA) are ordered by MEMBAR as if they were both a load and a store, since they 
share the semantics of both. An STBAR instruction, with semantics that are a subset 
of MEMBAR, is provided for SPARC V8 compatibility. MEMBAR and STBAR 
operate on all pending memory operations in the reorder buffer, independently of 
their address or ASI, ordering them with respect to all future memory operations. 


L-Sequencing MEMBARs are needed for some input/output operations, forcing stores into specialized stable 
storage, context switching, and occasional other system functions. Using a sequencing MEMBAR when one is 
not needed may cause a degradation of performance. See Programming with the Memory Models, contained in 
the separate volume UltraSPARC Architecture Application Notes, for examples of the use of sequencing 
MEMBARs. 


CHAPTER 9 * Memory 393 


This ordering applies only to memory-reference instructions issued by the virtual 
processor issuing the MEMBAR. Memory-reference instructions issued by other 
virtual processors are unaffected. 


The ordering relationships are bit-encoded as shown in TABLE 9-3. For example, 
MEMBAR 0146, written as “membar #LoadLoad” in assembly language, requires 
that all load operations appearing before the MEMBAR in program order complete 
before any of the load operations following the MEMBAR in program order 
complete. Store operations are unconstrained in this case. MEMBAR 0846 
(#StoreStore) is equivalent to the STBAR instruction; it requires that the values 
stored by store instructions appearing in program order prior to the STBAR 
instruction be visible to other virtual processors before issuing any store operations 
that appear in program order following the STBAR. 


In TABLE 9-3 these ordering relationships are specified by the “<m” symbol, which 
signifies memory order. See Appendix D, Formal Specification of the Memory Models, 
for a formal description of the <m relationship. 


TABLE 9-3 Ordering Relationships Selected by Mask 








Ordering Relation, ^ Assembly Language Effective Behavior Mask = nmask __ 

Earlier <m Later Constant Mnemonic in TSO model Value Bit # 

Load <m Load LoadLoad nop 0116 0 

Store <m Load StoreLoad #StoreLoad 0246 1 

Load <m Store LoadStore nop 0446 2 

Store «m Store StoreStore nop 0816 3 
Implementation | An UltraSPARC Architecture 2005 implementation that only 


Note | implements the TSO memory model may implement 
MEMBAR #LoadLoad, MEMBAR #LoadStore, and 
MEMBAR #StoreStore as nops and MEMBAR #Storeload 
as a MEMBAR #Sync. 


9.5.5.2 Sequencing MEMBAR Instructions 


A sequencing MEMBAR exerts explicit control over the completion of operations. 
The three sequencing MEMBAR options each have a different degree of control and 
a different application. 


m Lookaside Barrier — Ensures that loads following this MEMBAR are from 
memory and not from a lookaside into a write buffer. Lookaside Barrier requires 
that pending stores issued prior to the MEMBAR be completed before any load 
from that address following the MEMBAR may be issued. A Lookaside Barrier 
MEMBAR may be needed to provide lock fairness and to support some plausible 
I/O location semantics. See the example in “Control and Status Registers” in 
Programming with the Memory Models, contained in the separate volume 
UltraSPARC Architecture Application Notes. 
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mw Memory Issue Barrier — Ensures that all memory operations appearing in 
program order before the sequencing MEMBAR complete before any new 
memory operation may be initiated. See the example in “I/O Registers with Side 
Effects” in Programming with the Memory Models, contained in the separate volume 
UltraSPARC Architecture Application Notes. 


m Synchronization Barrier — Ensures that all instructions (memory reference and 
others) preceding the MEMBAR complete and that the effects of any fault or error 
have become visible before any instruction following the MEMBAR in program 
order is initiated. A Synchronization Barrier MEMBAR fully synchronizes the 
virtual processor that issues it. 


TABLE 9-4 shows the encoding of these functions in the MEMBAR instruction. 
TABLE 9-4 Sequencing Barrier Selected by Mask 





Sequencing Function Assembler Tag Mask Value cmask Bit # 
Lookaside Barrier #Lookaside 1016 0 
Memory Issue Barrier #MemIssue 2016 1 
Synchronization Barrier #Sync 4016 2 





Implementation | In UltraSPARC Architecture 2005 implementations, 
Note | MEMBAR #Lookaside and MEMBAR #MemIssue are 
typically implemented as a MEMBAR #Sync. 


For more details, see the MEMBAR instruction on page 260 of Chapter 7, Instructions. 


9.5.5.3 Synchronizing Instruction and Data Memory 


The SPARC V9 memory models do not require that instruction and data memory 
images be consistent at all times. The instruction and data memory images may 
become inconsistent if a program writes into the instruction stream. As a result, 
whenever instructions are modified by a program in a context where the data (that 
is, the instructions) in the memory and the data cache hierarchy may be inconsistent 
with instructions in the instruction cache hierarchy, some special programmatic 
(software) action must be taken. 


The FLUSH instruction will ensure consistency between the in-flight instruction 
stream and the data references in the virtual processor executing FLUSH. The 
programmer must ensure that the modification sequence is robust under multiple 
updates and concurrent execution. Since, in general, loads and stores may be 
performed out of order, appropriate MEMBAR and FLUSH instructions must be 
interspersed as needed to control the order in which the instruction data are 
modified. 


The FLUSH instruction ensures that subsequent instruction fetches from the 
doubleword target of the FLUSH by the virtual processor executing the FLUSH 
appear to execute after any loads, stores, and atomic load-stores issued by the virtual 
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processor to that address prior to the FLUSH. FLUSH acts as a barrier for instruction 
fetches in the virtual processor on which it executes and has the properties of a store 
with respect to MEMBAR operations. 


IMPL. DEP. #122-V9: The latency between the execution of FLUSH on one virtual 
processor and the point at which the modified instructions have replaced outdated 
instructions in a multiprocessor is implementation dependent. 


Programming | Because FLUSH is designed to act on a doubleword and 

Note | because, on some implementations, FLUSH may trap to system 
software, it is recommended that system software provide a 
user-callable service routine for flushing arbitrarily sized regions 
of memory. On some implementations, this routine would issue 
a series of FLUSH instructions; on others, it might issue a single 
trap to system software that would then flush the entire region. 





On an UltraSPARC Architecture virtual processor: 


m A FLUSH instruction causes a synchronization with the virtual processor, which 
flushes the instruction pipeline in the virtual processor on which the FLUSH 
instruction is executed. 


m Coherency between instruction and data memories may or may not be 
maintained by hardware. If it is, an UltraSPARC Architecture implementation 
may ignore the address in the operands of a FLUSH instruction. 


Programming | UltraSPARC Architecture virtual processors are not required to 
Note | maintain coherency between instruction and data caches in 
hardware. Therefore, portable software must do the following: 


(1) must always assume that store instructions (except Block 
Store with Commit) do not coherently update instruction 
cache(s); 


(2) must, in every FLUSH instruction, supply the address of the 
instruction or instructions that were modified. 





For more details, see the FLUSH instruction on page 174 of Chapter 7, Instructions. 





9.6 


Nonfaulting Load 


A nonfaulting load behaves like a normal load, with the following exceptions: 
m À nonfaulting load from a location with side effects (TTE.e = 1) causes a 
data_access_exception exception. 


a A nonfaulting load from a page marked for nonfault access only (TTE.nfo = 1) is 
allowed; other types of accesses to such a page cause a dala access exception 
exception. 
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m These loads are issued with ASI_PRIMARY_NO_FAULT[_LITTLE] or 
ASI_SECONDARY_NO_FAULT[_LITTLE]. A store with a NO_FAULT ASI causes a 
data_access_exception exception. 











Typically, optimizers use nonfaulting loads to move loads across conditional control 
structures that guard their use. This technique potentially increases the distance 
between a load of data and the first use of that data, in order to hide latency. The 
technique allows more flexibility in instruction scheduling and improves 
performance in certain algorithms by removing address checking from the critical 
code path. 


For example, when following a linked list, nonfaulting loads allow the null pointer 
to be accessed safely in a speculative, read-ahead fashion; the page at virtual address 
016 can safely be accessed with no penalty. The TTE.nfo bit marks pages that are 
mapped for safe access by nonfaulting loads but that can still cause a trap by other, 
normal accesses. 


Thus, programmers can trap on “wild” pointer references—many programmers 
count on an exception being generated when accessing address 046 to debug 
software—while benefiting from the acceleration of nonfaulting access in debugged 
library routines. 





9:7 


Store Coalescing 


Cacheable stores may be coalesced with adjacent cacheable stores within an 8 byte 
boundary offset in the store buffer to improve store bandwidth. Similarly non-side- 
effect-noncacheable stores may be coalesced with adjacent non-side-effect 
noncacheable stores within an 8-byte boundary offset in the store buffer. 


In order to maintain strong ordering for I/O accesses, stores with side-effect 
attribute (e bit set) will not be combined with any other stores. 


Stores that are separated by an intervening MEMBAR #Sync will not be coalesced. 
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CHAPTER 1 0 


Address Space Identifiers (ASIs) 





This appendix describes address space identifiers (ASIs) in the following sections: 


Address Space Identifiers and Address Spaces on page 399. 
ASI Values on page 399. 
ASI Assignments on page 400. 


a 
a 
a 
m Special Memory Access ASIs on page 409. 





10.1 Address Space Identifiers and Address 
Spaces 


An UltraSPARC Architecture processor provides an address space identifier (ASI) 
with every address sent to memory. The ASI does the following: 


m Distinguishes between different address spaces 
m Provides an attribute that is unique to an address space 
m Maps internal control and diagnostics registers within a virtual processor 


The memory management unit uses a 64-bit virtual address and an 8-bit ASI to 
generate a memory, I/O, or internal register address. 





10.2 ASI Values 


The range of address space identifiers (ASIs) is 0046-FF16. That range is divided into 
restricted and unrestricted portions. ASIs in the range 8045—FF46 are unrestricted; 
they may be accessed by software running in any privilege mode. 


399 


ASIs in the range 00:6-7F36 are restricted; they may only be accessed by software 
running in a mode with sufficient privilege for the particular ASI. ASls in the range 
0015-2F46 may only be accessed by software running in privileged or 
hyperprivileged mode and ASIs in the range 30:6-7F16 may only be accessed by 
software running in hyperprivileged mode. 


SPARC V9 | In SPARC V9, the range of ASIs was evenly divided into 
Compatibility | restricted (00,6-7F16) and unrestricted (80)¢-FF,¢) halves. 
Note 


An attempt by nonprivileged software to access a restricted (privileged or 
hyperprivileged) ASI (00:6-7F:6) causes a privileged action trap. 


An attempt by privileged software to access a hyperprivileged ASI (30465-7F46) also 
causes a privileged action trap. 


An ASI can be categorized based on how it affects the MMU’s treatment of the 
accompanying address, into one of three categories: 


m A Translating ASI (the most common type) causes the accompanying address to be 
treated as a virtual address (which is translated by the MMU). 


m A Non-translating ASI is not translated by the MMU; instead the address is passed 
through unchanged. Nontranslating ASIs are typically used for accessing internal 
registers. 


m A Real ASI causes the accompanying address to be treated as a real address. An 
access using a Real ASI can cause exception(s) only visible in hyperprivileged 
mode. Real ASIs are typically used by privileged software for directly accessing 
memory using real (as opposed to virtual) addresses. 


Implementation-dependent ASIs may or may not be translated by the MMU. See 
implementation-specific documentation for detailed information about 
implementation-dependent ASIs. 





10.3 ASI Assignments 


Every load or store address in an UltraSPARC Architecture processor has an 8-bit 
Address Space Identifier (ASI) appended to the virtual address (VA). The VA plus 
the ASI fully specify the address. 


For instruction fetches and for data loads, stores, and load-stores that do not use the 
load or store alternate instructions, the ASI is an implicit ASI generated by the 
virtual processor. 
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10.3.1 


If a load alternate, store alternate, or load-store alternate instruction is used, the 
value of the ASI (an "explicit ASI") can be specified in the ASI register or as an 
immediate value in the instruction. 


In practice, ASIs are not only used to differentiate address spaces but are also used 
for other functions like referencing registers in the MMU unit. 


Supported ASIs 


TABLE 10-1 lists architecturally-defined ASIs; some are in all UltraSPARC Architecture 
implementations and some are only present in some implementations. 


An ASI marked with a closed bullet (@) is required to be implemented on all 
UltraSPARC Architecture 2005 processors. 


An ASI marked with an open bullet (0) is defined by the UltraSPARC Architecture 
2005 but is not necessarily implemented in all UltraSPARC Architecture 2005 
processors; its implemention is optional. Across all implementations on which it is 
implemented, it appears to software to behave identically. 


Some ASIs may only be used with certain load or store instructions; see table 
footnotes for details. 


The word “decoded” in the Virtual Address column of TABLE 10-1 indicates that the 
the supplied virtual address is decoded by the virtual processor. 


The “V / non-T / R” column of the table indicates whether each ASI is a Translating 
ASI(translates from Virtual), non-Translating ASI, or Real ASI, respectively. 


ASIs marked "Reserved" are set aside for use in future revisions to the architecture 
and are not to be used by implemenations. ASIs marked "implementation 
dependent" may be used for implementation-specific purposes. 


Attempting to access an address space described as “Implementation dependent” in 
TABLE 10-1 produces implementation-dependent results. 




















TABLE 10-1 UltraSPARC Architecture ASIs (1 of 8) 
Virtual V/ Shared 

ASI reqd(0) Access  |Address non-T/ /per 
Value opt! (O)|ASI Name (and Abbreviation) Type(s) (VA) R strand |Description 
00 O — —212 Implementation dependent! 
0316 
0446 @ SI NUCLEUS (ASI N) RW?^ (decoded) V — Implicit address space, 

nucleus context, TL > 0 
056 O = 212 Implementation dependent! 
0B46 
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TABLE 10-1 


ASI 


Value jopt’l (O)|ASI Name (and Abbreviation) 


UltraSPARC Architecture ASIs (2 of 8) 































































OC16 € SI NUCLEUS LITTLE (ASI NI) (decoded) V —  Implicit address space, 
nucleus context, TL » 0, 
little-endian 

ODic- O = 242 Implementation dependent 

OF 16 

1016 @ ASI_AS_IF_USER_PRIMARY RW**? (decoded) V — Primary address space, as if 

(ASI AIUP) user (nonprivileged) 
1146 @ ASI_AS_IF_USER_SECONDARY RW**? (decoded) V — Secondary address space, as 
(ASI AIUS) if user (nonprivileged) 

126 O c 2,12 Implementation dependent 

1416 O ASI REAL RW?* (decoded) R — Real address 

1546 O ASI_REAL_IO? RW” (decoded) R — Real address, noncacheable, 
with side effect (deprecated) 

1646 O ASI BLOCK AS IF USER PRIMARY RW75!4)8(decoded) V — Primary address space, 

(ASI BLK AIUP) block load /store, as if user 
(nonprivileged) 
1716 O ASI BLOCK AS IF USER SECONDAR RW?®1418 (decoded) V — Secondary address space, 
Y block load /store, as if user 
(ASI BLK AIUS) (nonprivileged) 
1846 @ ^31 AS IF USER PRIMARY LITTLE RW218 (decoded) V — Primary address space, as if 
(ASI, AIUPL) user (nonprivileged), little- 
endian 

1946 @ ASI_AS_IF_USER_SECONDARY_ RWZ%75 (decoded) V — Secondary address space, as 

LITTLE (ASI AIUSL) if user (nonprivileged), little- 
endian 

lAig- O - —212 Implementation dependent! 

1Bi¢ 
1C4g O ASI REAL LITTLE RW 77 (decoded) R — _ Real address, little-endian 
(ASI REAL L) 

1Dig O ASI, REAL IO LITTLE RW 75 (decoded) R — Real address, noncacheable, 
(ASI REAL IO ID) with side effect, little-endian 

(deprecated) 

1E16 O ASI BLOCK AS IF USER PRIMARY RW7®1418 (decoded) V — Primary address space, 
LITTLE block load /store, as if user 
(ASI BLK AIUPI) (nonprivileged), little-endian 

1F16 O ASI BLOCK AS. IF USER RW?51418(decoded) v — Secondary address space, 
SECONDARY LITTLE block load/store, as if user 





(ASI BLK AIUS L) (nonprivileged), little-endian 
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TABLE 10-1 UltraSPARC Architecture ASIs (3 of 8) 


Virtual 






ASI Address non-T/ |/per 














Value opt! (O)|ASI Name (and Abbreviation) (VA) strand |Description 
2016 © ASI SCRATCHPAD RW^? (decoded; non-T per Privileged Scratchpad 
see below) strand registers; implementation 
dependent 
O O16 5 "Scratchpad Register 0! 
O 816 5 "  Scratchpad Register 1! 
O 1016 » Scratchpad Register 2! 
O 1816 i "  Scratchpad Register 3! 
O 2016 i " Scratchpad Register 4! 
O 2816 D "  Scratchpad Register 5! 
O 3016 n " Scratchpad Register 6! 
O 3816 i "  Scratchpad Register 7! 
2116 O ASI. MMU CONTEXTID RW?29 (decoded; non-T per MMU context registers 
see below) strand 
O " 816 3 "I/D MMU Primary 
Context ID register 
O g 1016 $ "I/D MMU Secondary 
Context ID register 
2216 O ASI TWINX AS IF USER RZ7 (decoded) V — Primary address space, 128- 
PRIMARY bit atomic load twin 
(ASI_TWINX_AIUP) extended word, as if user 
(nonprivileged) 
236 © ASI TWINX AS IF USER R27! (decoded) v — Secondary address space, 
SECONDARY 128-bit atomic load twin 
(ASI TWINX, AIUS) extended word, as if user 
(nonprivileged) 
2446 O — = Implementation dependent! 
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TABLE 10-1 UltraSPARC Architecture ASIs (4 of 8) 






ASI non-T/ |/per 
Value opt! (O)|ASI Name (and Abbreviation) strand |Description 


2546 O ASI, QUEUE (see (decoded; non-T per 
below) see below) strand 

O Rw? 3C016 B " CPU Mondo Queue Head 
Pointer 

Oo RW2617 SC " " CPU Mondo Queue Tail 
Pointer 

O Rw? 3D016 à " Device Mondo Queue Head 
Pointer 

O RW2617 3D816 i " Device Mondo Queue Tail 
Pointer 

O Rw? 3E016 Resumable Error Queue 
Head Pointer 

O RW2647 3E816 Resumable Error Queue Tail 
Pointer 

O RW?26 8F04c Nonresumable Error Queue 
Head Pointer 

O RW2647 3F816 Nonresumable Error Queue 





Tail Pointer 


2616 O ASI_TWINX_REAL (ASI_TWINX_R) R”!  (decoded)  R —  128-bit atomic twin 
ASI_QUAD_LDD_REAL?t extended-word load from 
real address 


2716 © ASI TWINX NUCLEUS R27! (decoded) V — Nucleus context, 128-bit 




















(ASI TWINX N) atomic load twin extended- 
word 
DB"  Enplamentaton dependent 
2816- O = =” EN — — Implementation dependent 
PAGUT UR nn qd 00 
2A46 O ASI TWINX AS IF USER Rẹ” (decoded) V — Primary address space, 128- 
PRIMARY_LITTLE bit atomic load twin 
(ASI_TWINXAIUPL) extended-word, as if user 
(nonprivileged), little-endian 
2Bijg O ASI TWINX AS IF USER R27! (decoded) v — Secondary address space, 
SECONDARY LITTLE 128-bit atomic load twin 
(ASI TWINX AIUS L) extended-word, as if user 
(nonprivileged), little-endian 
2C16 O m —2 Implementation dependent! 
2D16 O = —212 Z — — Implementation dependent! 
2E16 © ASI TWINX REAL LITTLE R^ (decoded) R — 128-bit atomic twin- 


(ASI TWINX REAL L) extended-word load from 
ASI QUAD, LDD REAL LITTLEP' real address, little-endian 
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TABLE 10-1 UltraSPARC Architecture ASIs (5 of 8) 


Virtual 






Address non-T/ |/per 
strand |Description 













































































2Fi¢ © ASI TWINX NUCLEUS LITTLE R^ (decoded) V — Nucleus context, 128-bit 
(ASI TWINX NL) atomic load twin extended- 
word, little-endian 
3016 © — — Reserved for use in 
7F16 hyperprivilege mode 
4516 O —313 Implementation dependent 
4616- O — 313 Implementation dependent! 
4916 O = —343 Implementation dependent! 
4A1c- O = — 313 Implementation dependent! 
4B16 
4C16 O Error Status and Enable Registers Implementation dependent! 
8016 @ ASI_PRIMARY (ASI_P) RW (decoded) V —  Implicit primary address 
space 
8116 € ASI SECONDARY (ASI S) RW? (decoded) V — Secondary address space 
8216 €  ^SI PRIMARY NO FAULT (ASI PNF) R?!! (decoded) V — Primary address space, no 
fault 
8316 € A31 SECONDARY NO FAULT R?H (decoded) V — Secondary address space, no 
(ASI SNF) fault 
841- © = —16 Reserved 
8716 
8816 € ASI PRIMARY LITTLE (ASI PL) RW? (decoded) V —  Implicit primary address 
space, little-endian 
8916 €  ^SI SECONDARY LITTLE (ASI SL) RW? (decoded) V — Secondary address space, 
little-endian 
8Aig € ASI PRIMARY NO FAULT LITTLE R?! (decoded) V — Primary address space, no 
(ASI PNFL) fault, little-endian 
8B16 € ASI SECONDARY NO FAULT LITTLE R?1 (decoded) vy —  Seondary address space, no 
(ASI_SNFL) fault, little-endian 
8Cig- e EE —1é Reserved 
C016 O ASI PST8 PRIMARY (Asi PST8 P) W®1014 (decoded) V — Primary address space, 8x8- 
bit partial store 
Clie O ASI PST8 SECONDARY Ww810,14 (decoded) V — Secondary address space, 
(ASI PST8 S) 8x8-bit partial store 
C216 © ASI PST16 PRIMARY W91944 (decoded) v — Primary address space, 
(ASI_PST16_P) 4x16-bit partial store 
C316 © ASI PST16 SECONDARY we014 (decoded) V — Secondary address space, 
(ASI PST16 S) 4x16-bit partial store 
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TABLE 10-1 UltraSPARC Architecture ASIs (6 of 8) 
ASI 
Value opt! (O)|ASI Name (and Abbreviation) strand |Description 
C416 O ASI PST32 PRIMARY (decoded) V — Primary address space, 2x32- 
(ASI PST32 P) bit partial store 
Cbig © ASI PST32. SECONDARY W91944 (decoded) v — Secondary address space, 
(ASI_PST32_S) 2x32-bit partial store 
Cóg- e = —b Implementation dependent 
C716 
C816 Q ASI PST8 PRIMARY LITTLE W$101^ (decoded) V — Primary address space, 8x8- 
(ASI PST8 PL) bit partial store, little-endian 
C916 © ASI PST8 SECONDARY LITTLE W9!944 (decoded) vy — Secondary address space, 
(ASI_PST8_SL) 8x8-bit partial store, little- 
endian 
CA6 O ASI PST16 PRIMARY LITTLE wear (decoded) V — Primary address space, 4x16- 
(ASI PST16 PL) bit partial store, little-endian 
CBig © ASI PST16 SECONDARY LITTLE WE (decoded) V — Secondary address space, 
(ASI PST16 SL) 4x16-bit partial store, little- 
endian 
CCi6 O ASI PST32 PRIMARY LITTLE WS$1014 (decoded) V — Primary address space, 
(ASI PST32 PL) 2x32-bit partial store, little- 
endian 
CDig O ASI PST32 SECONDARY LITTLE WS$1014 (decoded) V — Second address space, 2x32- 
(ASI PST32 SL) bit partial store, little-endian 
CE e = —15 Implementation dependent! 
CFi6 
D016 O ASI_FL8_PRIMARY (ASI_FL8_P) RW® (decoded) V — Primary address space, one 
8-bit floating-point load / 
store 
Dig O ASI FL8 SECONDARY (ASI FL8 S) RW®!* (decoded) V — Second address space, one 8- 
bit floating-point load /store 
D246 O ASI FL16 PRIMARY (ASI FL16 P) RW®4 (decoded) V — Primary address space, one 
16-bit floating-point load / 
store 
D346 O ASI_FL16_SECONDARY RW? (decoded) V — Second address space, one 
(ASI FL16. S) 16-bit floating-point load/ 
store 
D41- e = —15 Implementation dependent! 
D8:6 O ASI FL8, PRIMARY LITTLE RW®! (decoded) V — Primary address space, one 
(ASI FL8 PL) 8-bit floating point load/ 
store, little-endian 
D9:6 O ASI FL8 SECONDARY LITTLE RW® (decoded) V — Second address space, one 8- 








(ASI FL8 SL) 
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bit floating point load /store, 
little-endian 


TABLE 10-1 UltraSPARC Architecture ASIs (7 of 8) 


Virtual 






ASI Address non-T/ |/per 
Value jopt’l (O)|ASI Name (and Abbreviation) strand |Description 



























































DAjg O ASI FL16 PRIMARY LITTLE : (decoded) V — Primary address space, one 
(ASI FL16 PL) 16-bit floating-point load/ 
store, little-endian 
DB:6 O ASI FL16 SECONDARY LITTLE RW” (decoded) V — Second address space, one 
(ASI_FL16_SL) 16-bit floating point load/ 
store, little-endian 
DCs e = —1 Implementation dependent! 
-DFig 
E0- e = —15 Reserved 
E216 O ASI TWINX PRIMARY RP (decoded) V — Primary address space, 128- 
(ASI TWINX P) bit atomic load twin 
extended word 
E346 O ASI TWINX SECONDARY RY (decoded) vy — Secondary address space, 
(ASI TWINX S) 128-bit atomic load twin 
extended-word 
E4c- © = —1 Implementation dependent! 
E916 
EAjg O ASI TWINX PRIMARY LITTLE RY (decoded) V — Primary address space, 128- 
(ASI TWINX PL) bit atomic load twin 
extended word, little endian 
EB:6 O ASI_TWINX_SECONDARY_LITTLE RP (decoded) V — Secondary address space, 
(ASI TWINX SL) 128-bit atomic load twin 
extended word, little endian 
EC- oO = —15 Implementation dependent! 
EF i6 
F016 O ASI_BLOCK_PRIMARY RW (decoded) V — Primary address space, 8x8- 
(ASI BLK P) byte block load /store 
Flié OQ ASI BLOCK SECONDARY RW814 (decoded) v — Secondary address space, 
(ASI_BLK_S) 8x8- byte block load/store 
Fac © = —15 Implementation dependent! 
F516 
F616- e = = Implementation dependent! 
F816 O ASI, BLOCK PRIMARY LITTLE RW®! (decoded) V — Primary address space, 8x8- 
(ASI BLK PL) byte block load/store, little 


endian 
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TABLE 10-1 UltraSPARC Architecture ASIs (8 of 8) 


Virtual VI Shared 
ASI |req’d(@) Access  |Address non-T/ |/per 
Value opt! (O)|ASI Name (and Abbreviation) Type(s) (VA) R strand |Description 















F946 O ASI BLOCK SECONDARY LITTLE RW (decoded) V — Secondary address space, 
(ASI BLK SL) 8x8- byte block load/store, 
little endian 
FA e. — —15 Implementation dependent 
FD 6 
FE; e — — Implementation dependent! 
FFig 


t This ASI name has been changed, for consistency; although use of this name is 
deprecated and software should use the new name, the old name is listed here for 
compatibility. 

1 Implementation dependent ASI (impl. dep. #29); available for use by implementors. 
Software that references this ASI may not be portable. 


2 An attempted load alternate, store alternate, atomic alternate or prefetch alternate 
instruction to this ASI in nonprivileged mode causes a privileged action exception. 


3 An attempted load alternate, store alternate, atomic alternate or prefetch alternate 
instruction to this ASI in nonprivileged mode or privileged mode causes a 
privileged action exception. 


4 May be used with all load alternate, store alternate, atomic alternate and prefetch 
alternate instructions (CASA, CASXA, LDSTUBA, LDTWA, LDDFA, LDFA, LDSBA, 
LDSHA, LDSWA, LDUBA, LDUHA, LDUWA, LDXA, PREFETCHA, STBA, STTWA, 
STDFA, STFA, STHA, STWA, STXA, SWAPA). 


5 May be used with all of the following load alternate and store alternate instructions: 
LDTWA, LDDFA, LDFA, LDSBA, LDSHA, LDSWA, LDUBA, LDUHA, LDUWA, LDXA, 
STBA, STTWA, STDFA, STFA, STHA, STWA, STXA. Use with an atomic alternate or 
prefetch alternate instruction (CASA, CASXA, LDSTUBA, SWAPA or PREFETCHA) 
causes a data access exception exception. 


6 May only be used in a LDXA or STXA instruction for RW ASIs, LDXA for read-only ASIs 
and STXA for write-only ASIs. Use of LDXA for write-only ASIs, STXA for read-only 
ASIs, or any other load alternate, store alternate, atomic alternate or prefetch alternate 
instruction causes a data access exception exception. 

7 May only be used in an LDTXA instruction. Use of this ASI in any other load alternate, 
store alternate, atomic alternate or prefetch alternate instruction causes a 
data access exception exception. 

8 May only be used in a LDDFA or STDFA instruction for RW ASIs, LDDFA for read-only 
ASIs and STDFA for write-only ASIs. Use of LDDFA for write-only ASIs, STDFA for 
read-only ASIs, or any other load alternate, store alternate, atomic alternate or prefetch 
alternate instruction causes a data access exception exception. 
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9 May be used with all of the following load and prefetch alternate instructions: LDTWA, 
LDDFA, LDFA, LDSBA, LDSHA, LDSWA, LDUBA, LDUHA, LDUWA, LDXA, 
PREFETCHA. Use with an atomic alternate or store alternate instruction causes a 
data_access_exception exception. 


10 Write(store)-only ASI; an attempted load alternate, atomic alternate, or prefetch alternate 
instruction to this ASI causes a data_access_exception exception. 


11 Read(load)-only ASI; an attempted store alternate or atomic alternate instruction to this 
ASI causes a data_access_exception exception. 


12 An attempted load alternate, store alternate, atomic alternate or prefetch alternate 
instruction to this ASI in privileged mode causes a data_access_exception exception. 

14 An attempted access to this ASI may cause an exception (see Special Memory Access ASIs 
on page 409 for details). 


15 An attempted load alternate, store alternate, atomic alternate or prefetch alternate 
instruction to this ASI in any mode causes a data_access_exception exception if this ASI 
is not implemented by the model dependent implementation. 


16 An attempted load alternate, store alternate, atomic alternate or prefetch alternate 
instruction to a reserved ASI in any mode causes a data_access_exception exception. 


17 The Queue Tail Registers (ASI 2546) are read-only. An attempted write to the Queue Tail 
Registers causes a data access exception exception 





10.4 


10.4.1 


Special Memory Access ASIs 


This section describes special memory access ASIs that are not described in other 
sections. 


ASIs 1046, 1116, 1616, 1716 and 1816 
(AST_*AS IF USER, *) 


These ASI are intended to be used in accesses from privileged mode, but are 
processed as if they were issued from nonprivileged mode. Therefore, they are 
subject to privilege-related exceptions. They are distinguished from each other by 
the context from which the access is made, as described in TABLE 10-2. 


When one of these ASIs is specified in a load alternate or store alternate instruction, 
the virtual processor behaves as follows: 

m In nonprivileged mode, a privileged action exception occurs 

m In any other privilege mode: 


a If U/DMMU TTE.p = 1, a data access exception (privilege violation) 
exception occurs 
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a Otherwise, the access occurs and its endianness is determined by the U/ 
DMMU TTE.ie bit. If U/DMMU TTE.ie = 0, the access is big-endian; 
otherwise, it is little-endian. 





TABLE 10-2 Privileged ASI *AS IF USER * ASIs 




















Addressing 
ASI Names (Context) Endianness of Access 
1016 ASI AS IF USER PRIMARY (ASI AIUP) Virtual 
(Primary) |Big-endian when 
1116 ASI AS IF USER SECONDARY (ASI AIUS) Virtual U/DMMU 
(Secondary) TTE.ie = 0; 
little-endian when 
1616 ASI BLOCK AS IF USER PRIMARY Virtual — |U/DMMU 
(ASI BLK AIUP) (Primary) TTE.ie = 1 
1716 ASI BLOCK AS IF USER SECONDARY Virtual 
(ASI BLK AIUS) (Secondary) 





10.4.2 ASIs 1846, 1916, 1E6, and 1F36 
(ASI_*AS_IF_USER_*_LITTLE) 


These ASls are little-endian versions of ASIs 1046, 1146, 1616, and 1716 
(ASI_AS_IF_USER_*), described in section 10.4.1. Each operates identically to the 
corresponding non-little-endian ASI, except that if an access occurs its endianness is 
the opposite of that for the corresponding non-little-endian ASI. 





These ASI are intended to be used in accesses from privileged mode, but are 
processed as if they were issued from nonprivileged mode. Therefore, they are 
subject to privilege-related exceptions. They are distinguished from each other by 
the context from which the access is made, as described in TABLE 10-3. 


When one of these ASIs is specified in a load alternate or store alternate instruction, 
the virtual processor behaves as follows: 

m In nonprivileged mode, a privileged action exception occurs 

m In any other privilege mode: 


a If U/DMMU TTE.p = 1, a data access exception (privilege violation) 
exception occurs 


a Otherwise, the access occurs and its endianness is determined by the U/ 
DMMU TTE.ie bit. If U/DMMU TTE.ie = 0, the access is little-endian; 
otherwise, it is big-endian. 
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10.4.3 


10.4.4 














TABLE 10-3 Privileged AST_*AS_IF_USER_*_LITTLE ASIs 






































Addressing Endianness of 
ASI Names (Context) Access 
1815 ASI AS IF USER PRIMARY LITTLE Virtual : x 
(ASI AIUPL) (Primary) |Little-endian 
when U/ 
1916 ASI AS IF USER SECONDARY, LITTLE Virtual DMMU 
(ASI AIUSL) (Secondary) |TTE.ie - 0; 
1Ej6 ASI BLOCK AS IF USER PRIMARY LITTLE Virtual [big-endian 
(ASI BLK AIUP) (Primary) |when U/ 
DMMU 
1F36 ASI BLOCK AS IF USER SECONDARY LITTLE Virtual TTE.ie=1 
(ASI BLK AIUSL) (Secondary) 





ASI 1446 (ASI | REAL) 





When ASI. REAL is specified in any load alternate, store alternate or prefetch 
alternate instruction, the virtual processor behaves as follows: 


m In nonprivileged mode, a privileged action exception occurs 
m In any other privilege mode: 
= VA is passed through to RA 
a During the address translation, context values are disregarded. 
a The endianness of the access is dertermined by the U/DMMU TTE.ie bit; if U/ 
DMMU TTE.ie = 0, the access is big-endian, otherwise it is little-endian. 


Even if data address translation is disabled, an access with this ASI is still a 
cacheable access. 


ASI 1546 (ASI REAL IO) 





Accesses with ASI REAL IO bypass the external cache and behave as if the side 
effect bit (TTE.e bit) is set. When this ASI is specified in any load alternate or store 
alternate instruction, the virtual processor behaves as follows: 


m In nonprivileged mode, a privileged action exception occurs 


m If used with a CASA, CASXA, LDSTUBA, SWAPA, or PREFETCHA instruction, a 
data access exception exception occurs 


m Used with any other load alternate or store alternate instuction, in privileged 
mode: 


= VA is passed through to RA 


a During the address translation, context values are disregarded. 
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a The endianness of the access is dertermined by the U/DMMU TTE.ie bit; if U/ 
DMMU TTE.ie = 0, the access is big-endian, otherwise it is little-endian. 


10.4.5 ASI 1Ci¢ (ASI REAL LITTLE) 





ASI REAL LITTLE is a little-endian version of ASI 1456 (ASI, REAL). It operates 
identically to ASI. REAL, except if an access occurs, its endianness the opposite of 
that for ASI REAL. 




















10.4.6 ASI 1Di¢ (ASI REAL IO LITTLE) 





ASI REAL IO LITTLE is a little-endian version of ASI 1544 (A81 REAL IO). It 
operates identically to ASI REAL IO, except if an access occurs, its endianness the 
opposite of that for ASI REAL IO. 




















10.4.7 = ASIs 2246, 2346, 2716, 2A16, 2B16, 2F16 
(Privileged Load Integer Twin Extended 
Word) 


ASIs 2216, 2316 2716, 2A46, 2B16 and 2F36 exist for use with the (nonportable) 
LDTXA instruction as atomic Load Integer Twin Extended Word operations (see Load 
Integer Twin Extended Word from Alternate Space on page 255). These ASIs are 
distinguished by the context from which the access is made and the endianness of 
the access, as described in TABLE 10-4. 
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10.4.8 


TABLE 10-4 Privileged Load Integer Twin Extended Word / Block Store Init ASIs 





























Addressing Endianness of 
ASI Names (Context) Access 
2256 ASI TWINX AS IF USER PRIMARY Virtual Big-endian 
(ASI TWINX AIUP) (Primary) when U/ 
2316 ASI_TWINX_AS_IF_USER_SECONDARY Virtual ea 0: 
d S Y das 
(ASI  TWINX AIUS) (Secondary) little-endian 
2756 ASI_TWINX_NUCLEUS (ASI_TWINX_N) Virtual when U/ 
(Nucleus) DMMU 
TTE.ie =1 
2A36 ASI TWINX AS IF USER PRIMARY LITTLE Virtual Little-endian 
(ASI TWINX AIUP. L) (Primary) when U/ 
2B36 ASI_TWINX_AS_IF_USER_SECONDARY_ Virtual alee 0: 
LITTLE (ASI_TWINX_AIUS_L) (Secondary) ,. ^", ” 
big-endian 
2F4g ASI TWINX NUCLEUS, LITTLE Virtual when U/ 
(ASI, TWINX NL) (Nucleus) DMMU 
TTE.ie =1 





When these ASIs are used with LDTXA, a mem adaress not aligned exception is 
generated if the operand address is not 16-byte aligned. 


If these ASIs are used with any other Load Alternate, Store Alternate, Atomic Load- 
Store Alternate, or PREFETCHA instruction, a data access exception exception is 
always generated and mem address not aligned is not generated. 


Compatibility | These ASIs replaced ASIs 2415 and 2C46 used in earlier 
Note | UltraSPARC implementations; see the detailed Compatibility Note 
on page 418 for details. 


ASIs 2616 and 2E, (Privileged Load Integer Twin 
Extended Word, Real Addressing) 


ASIs 2646 and 2E6 exist for use with the LDTXA instruction as atomic Load Integer 
Twin Extended Word operations using Real addressing (see Load Integer Twin 
Extended Word from Alternate Space on page 255). These two ASIs are distinguished by 
the endianness of the access, as described in TABLE 10-5. 


CHAPTER 10 * Address Space Identifiers (ASIs) 413 


TABLE 10-5 Load Integer Twin Extended Word (Real) ASIs 








Addressing 
ASI Name (Context) Endianness of Access 
2636 ASI TWINX REAL Real Big-endian when U/DMMU 


(ASI TWINX R) TTE.ie = 0; little-endian when U/ 


C)  pMMUTTEie-1 


Real Little-endian when U/DMMU 
se TTE.ie = 0; big-endian when U/ 
C)  pMMUTTEie-1 





2E16 ASI TWINX REAL LITTLE 
(ASI TWINX REAL L) 





When these ASIs are used with LDTXA, a mem adaress not aligned exception is 
generated if the operand address is not 16-byte aligned. 


If these ASIs are used with any other Load Alternate, Store Alternate, Atomic Load- 
Store Alternate, or PREFETCHA instruction, a data access exception exception is 
always generated and mem address not aligned is not generated. 


Compatibility | These ASIs replaced ASIs 3416 and 3C46 used in earlier 
Note | UltraSPARC implementations; see the Compatibility Note on 
page 418 for details. 


10.4.9 ASIs E216 E316, EA1e, EB;4 
(Nonprivileged Load Integer Twin Extended 
Word) 


ASIs E216, E316, EA46, and EB4, exist for use with the (nonportable) LDTXA 
instruction as atomic Load Integer Twin Extended Word operations (see Load Integer 
Twin Extended Word from Alternate Space on page 255). These ASIs are distinguished 
by the address space accessed (Primary or Secondary) and the endianness of the 
access, as described in TABLE 10-6. 
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10.4.10 


TABLE 10-6 Load Integer Twin Extended Word ASIs 














Addressing Endianness of 
ASI Names (Context) Access 
E246 ASI_TWINX_PRIMARY (ASI, TWINX P) Virtual Big-endian 
(Primary) when U/ 
E346 ASI TWINX SECONDARY (ASI TWINX S) DI 
^ y = = TTE.ie = 0, 
Virtual little-endian 
(Secondary) when U/ 
DMMU 
TTE.ie =1 
EA4g ASI TWINX PRIMARY LITTLE Virtual Little-endian 
(ASI TWINX PL) (Primary) when U/ 
DMMU 
EB4g ASI, TWINX SECONDARY, LITTLE TTEie = 0 
(ASI_TWINX_SL) . "d 
Virtual big-endian 
(Secondary) when U/ 
DMMU 
TTE.ie =1 





When these ASIs are used with LDTXA, a mem adaress not aligned exception is 
generated if the operand address is not 16-byte aligned. 


If these ASIs are used with any other Load Alternate, Store Alternate, Atomic Load- 
Store Alternate, or PREFETCHA instruction, a data access exception exception is 
always generated and mem address not aligned is not generated. 


Block Load and Store ASIs 


ASIs 1616 176 1E;6, 1F56, F046, Flié, F816, and F916 exist for use with LDDFA and 
STDFA instructions as Block Load (LDBLOCKF) and Block Store (STBLOCKF) 
operations (see Block Load on page 232 and Block Store on page 317). 


When these ASIs are used with the LDDFA (STDFA) opcode for Block Load (Store), 
a mem_address_not_aligned exception is generated if the operand address is not 64- 
byte aligned. 


If a Block Load or Block Store ASI is used with any other Load Alternate, Store 
Alternate, Atomic Load-Store Alternate, or PREFETCHA instruction, a 
data_access_exception exception is always generated and 
mem_address_not_aligned is not generated. 
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10.4.11 


10.4.12 


Partial Store ASIs 


ASIs C0465-C546 and C816-CD46 exist for use with the STDFA instruction as Partial 
Store (STPARTIALF) operations (see Store Partial Floating-Point on page 329). 


When these ASIs are used with STDFA for Partial Store, a 
mem_address_not_aligned exception is generated if the operand address is not 8- 
byte aligned and an illegal instruction exception is generated if i = 1 in the 
instruction and the ASI register contains one of the Partial Store ASIs. 


If one of these ASIs is used with a Store Alternate instruction other than STDFA, a 
Load Alternate, Store Alternate, Atomic Load-Store Alternate, or PREFETCHA 
instruction, a dala access exception exception is generated and 

mem address not aligned, LDDF mem address not aligned, and 

illegal instruction (for i = 1) are not generated. 


ASIs C016-C54¢ and C816-CD:6 are only defined for use in Partial Store operations 
(see page 329). None of them should be used with LDDFA; however, if any of those 
ASIs is used with LDDFA, the resulting behavior is specified in the LDDFA 
instruction description on page 241. 


Short Floating-Point Load and Store ASIs 


ASIs D0:6-D3:6 and D8:6-DB:, exist for use with the LDDFA and STDFA 
instructions as Short Floating-point Load and Store operations (see Load Floating- 
Point Register on page 236 and Store Floating-Point on page 321). 


When ASI D246, D346, DA, or DB:, is used with LDDFA (STDFA) for a 16-bit Short 
Floating-point Load (Store), a mem address not aligned exception is generated if 
the operand address is not halfword-aligned. 


If any of these ASIs are used with any other Load Alternate, Store Alternate, Atomic 
Load-Store Alternate, or PREFETCHA instruction, a data access exception 
exception is always generated and mem adaress not aligned is not generated. 





10.5 


ASI-Accessible Registers 


In this section the Data Watchpoint registers, and scratchpad registers are described. 


A list of UltraSPARC Architecture 2005 ASIs is shown in TABLE 10-1 on page 401. 
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10.5.1 


10.5.2 


Privileged Scratchpad Registers 
(AST_SCRATCHPAD) 


An UltraSPARC Architecture virtual processor includes eight Scratchpad registers 
(64 bits each, read/write accessible) (impl.dep. #302-U4-Cs10). The use of the 
Scratchpad registers is completely defined by software. 


For conventional uses of Scratchpad registers, see “Scratchpad Register Usage” in 
Software Considerations, contained in the separate volume UltraSPARC Architecture 
Application Notes. 


The Scratchpad registers are intended to be used by performance-critical trap 
handler code. 


The addresses of the privileged scratchpad registers are defined in TABLE 10-7. 


TABLE 10-7 Scratchpad Registers 





Privileged Scratchpad 
Assembly Language ASI Name ASI # Virtual Address Register # 


0016 0 
0816 
1016 
1816 
2016 
2816 
3016 
3816 


ASI_SCRATCHPAD 2016 


N Oo FO & WN r2 


IMPL. DEP. #404-S10: The degree to which Scratchpad registers 4-7 are accessible to 
privileged software is implementation dependent. Each may be 

(1) fully accessible, 

(2) accessible, with access much slower than to scratchpad registers 0-3, or 

(3) inaccessible (cause a data access exception). 


V9 Compatibility 
Note 


Privileged scratchpad registers are an UltraSPARC Architecture 
extension to SPARC V9. 





ASI Changes in the UltraSPARC Architecture 


The following Compatibility Note summarize the UltraSPARC ASI changes in 
UltraSPARC Architecture. 
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Compatibility | The names of several ASIs used in earlier UltraSPARC 
Note | implementations have changed in UltraSPARC Architecture. Their 
functions have not changed; just their names have changed. 

















ASI# Previous UltraSPARC UltraSPARC Architecture 

1456 ASI PHYS USE EC ASI REAL 

1556 ASI PHYS BYPASS EC WITH EBIT ASI REAL IO 

1C16 ASI PHYS USE EC LITTLE ASI REAL LITTLE 
(ASI, PHYS, USE, EC. L) 

1Dig ASI PHYS BYPASS EC WITH ASI REAL IO LITTLE 








EBIT LITTLE 
(ASI PHY, BYPASS EC WITH EBIT I) 








Compatibility | The names and ASI assignments (but not functions) changed 
Note | between earlier UltraSPARC implementations and UltraSPARC 
Architecture, for the following ASIs: 


Previous UltraSPARC UltraSPARC Architecture 
ASI# Name ASI# Name 


2436 ASI NUCLEUS QUAD LDDP 2756 ASI TWINX NUCLEUS 
(ASI TWINX, N) 











2C4g ASI NUCLEUS QUAD LDD. 2Fyg ASI TWINX NUCLEUS 
LITTLEP LITTLE 
(ASI NUCLEUS, QUAD LDD LP) (ASI TWINX NL) 

DDD 
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CHAPTER 1 1 


Performance Instrumentation 





This chapter describes the architecture for performance monitoring hardware on 
UltraSPARC Architecture processors. The architecture is based on the design of 
performance instrumentation counters in previous UltraSPARC Architecture 
processors, with an extension for the selective sampling of instructions. 





11.1 


11.1.1 


High-Level Requirements 


Usage Scenarios 


The performance monitoring hardware on UltraSPARC Architecture processors 
addresses the needs of various kinds of users. There are four scenarios envisioned: 


System-wide performance monitoring. In this scenario, someone skilled in system 
performance analysis (e.g, a Systems Engineer) is using analysis tools to evaluate 
the performance of the entire system. An example of such a tool is cpustat. The 
objective is to obtain performance data relating to the configuration and behavior 
of the system, e.g., the utilization of the memory system. 


Self-monitoring of performance by the operating system. In this scenario the OS is 

gathering performance data in order to tune the operation of the system. Some 

examples might be: 

a (a) determining whether the processors in the system should be running in 
single- or multi-stranded mode. 

= (b) determining the affinity of a process to a processor by examining that 
process's memory behavior. 


Performance analysis of an application by a developer. In this scenario a developer is 
trying to optimize the performance of a specific application, by altering the source 
code of the application or the compilation options. The developer needs to know 
the performance characteristics of the components of the application at a coarse 
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grain, and where these are problematic, to be able to determine fine-grained 
performance information. Using this information, the developer will alter the 
source or compilation parameters, re-run the application, and observe the new 
performance characteristics. This process is repeated until performance is 
acceptable, or no further improvements can be found. 


An example might be that a loop nest is measured to be not performing well. 
Upon closer inspection, the developer determines that the loop has poor cache 
behavior, and upon more detailed inspection finds a specific operation which 
repeatedly misses the cache. Reorganizing the code and/or data may improve the 
cache behavior. 


m Monitoring of an applications performance, e.g., by a Java Virtual Machine. In this 
scenario the application is not executing directly on the hardware, but its 
execution is being mediated by a piece of system software, which for the purposes 
of this document is called a Virtual Machine. This may be a Java VM, or a binary 
translation system running software compiled for another architecture, or for an 
earlier version of the UltraSPARC Architecture. One goal of the VM is to optimize 
the behavior of the application by monitoring its performance and dynamically 
reorganizing the execution of the application (e.g., by selective recompilation of 
the application). 


This scenario differs from the previous one principally in the time allowed to 
gather performance data. Because the data are being gathered during the 
execution of the program, the measurements must not adversely affect the 
performance of the application by more than, say, a few percent, and must yield 
insight into the performance of the application in a relatively short time 
(otherwise, optimization opportunities are deferred for too long). This implies an 
observation mechanism which is of very low overhead, so that many observations 
can be made in a short time. 


In contrast, a developer optimizing an application has the luxury of running or 
re-running the application for a considerable period of time (minutes or even 
hours) to gather data. However, the developer will also expect a level of precision 
and detail in the data which would overwhelm a virtual machine, so the accuracy 
of the data required by a virtual machine need not be as high as that supplied to 
the developer. 


Scenarios 1 and 2 are adequately dealt with by a suitable set of performance 
counters capable of counting a variety of performance-related events. Counters are 
ideal for these situations because they provide low-overhead statistics without any 
intrusion into the behavior of the system or disruption to the code being monitored. 
However, counters may not adequately address the latter two scenarios, in which 
detailed and timely information is required at the level of individual instructions. 
Therefore, UltraSPARC Architecture processors may also implement an instruction 
sampling mechanism. 
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11.1.3 


Metrics 


There are two classes of data reported by a performance instrumentation 
mechanism: 


m Architectural performance metrics. These are metrics related to the observable 
execution of code at the architectural level (UltraSPARC Architecture). Examples 
include: 


a The number of instructions executed 
» The number of floating point instructions executed 
a The number of conditional branch instructions executed 


m Implementation performance metrics. These describe the behavior of the 
microprocessor in terms of its implementation, and would not necessarily apply 
to another implementation of the architecture. 


In optimizing the performance of an application or system, attention will first be 
paid to the first class of metrics, and so these are more important. Only in 
performance-critical cases would the second class receive attention, since using these 
metrics requires a fairly extensive understanding of the specific implementation of 
the UltraSPARC Architecture. 


Accuracy Requirements 


Accuracy requirements for performance instrumentation vary depending on the 
scenario. The requirements are complicated by the possibly speculative nature of 
UltraSPARC Architecture processor implementations. For example, an 
implementation may include in its cache miss statistics the misses induced by 
speculative executions which were subsequently flushed, or provide two separate 
statistics, one for the misses induced by flushed instructions and one for misses 
induced by retired instructions. Although the latter would be desirable, the 
additional implementation complexity of associating events with specific 
instructions is significant, and so all events may be counted without distinction. The 
instruction sampling mechanism may distinguish between instructions that retired 
and those that were flushed, in which case sampling can be used to obtain statistical 
estimates of the frequencies of operations induced by mis-speculation. 


For critical performance measurements, architectural event counts must be accurate 
to a high degree (1 part in 10°). Which counters are considered performance-critical 
(and therefore accurate to 1 part in 10°) are specified in implementation-specific 
documentation. 


Implementation event counts must be accurate to 1 part in 10°, not including the 
speculative effects mentioned above. An upper bound on counter skew must be 
stated in implementation-specific documentation. 
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Programming | Increasing the time between counter reads will mitigate the 
Note | inaccurcies that could be introduced by counter skew (due to 
speculative effects). 





11:2 Performance Counters and Controls 


The performance instrumentation hardware provides performance instrumentation 
counters (PICs). The number and size of performance counters is implementation 
dependent, but each performance counter register contains at least one 32-bit 
counter. It is implementation dependent whether the performance counter registers 
are accessed as ASRs or are accessed through ASIs. 


There are one or more performance counter control registers (PCRs) associated with 
the counter registers. It is implementation dependent whether the PCRs are accessed 
as ASRs or are accessed through ASIs. 


Each counter in a counter register can count one kind of event at a time. The 
number of the kinds of events that can be counted is implementation dependent. 
For each performance counter register, the corresponding control register is used to 
select the event type being counted. A counter is incremented whenever an event of 
the matching type occurs. A counter may be incremented by an event caused by an 
instruction which is subsequently flushed (for example, due to mis-speculation). 
Counting of events may be controlled based on privilege mode or on the strand in 
which they occur. Masking may be provided to allow counting of subgroups of 
events (for example, various occurrences of different opcode groups). 


A field that indicates when a counter has overflowed must be present in either each 
performance instrumentation counter or in a PCR. 


Performance counters are usually provided on a per-strand basis. 


11.2.1 Counter Overflow 


Overflow of a counter is recorded in the overflow-indication field of register or a 
separate performance counter control register. 


Counter overflow indication is provided so that large counts can be maintained in 
software, beyond the range directly supported in hardware. The counters continue 
to count after an overflow, and software can utilize the overflow indicators to 
maintain additional high-order bits. 
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CHAPTER 1 2 


Traps 





A trap is a vectored transfer of control to software running in a privilege mode (see 
page 424) with (typically) greater privileges. A trap in nonprivileged mode can be 
delivered to privileged mode or hyperprivileged mode. A trap that occurs while 
executing in privileged mode can be delivered to privileged mode or 
hyperprivileged mode. 


The actual transfer of control occurs through a trap table that contains the first eight 
instructions (32 instructions for clean_window, window spill, and window fill, traps) 
of each trap handler. The virtual base address of the trap table for traps to be 
delivered in privileged mode is specified in the Trap Base Address (TBA) register. 
The displacement within the table is determined by the trap type and the current 
trap level (TL). One-half of each table is reserved for hardware traps; the other half is 
reserved for software traps generated by Tcc instructions. 


A trap behaves like an unexpected procedure call. It causes the hardware to do the 
following: 


1. Save certain virtual processor state (such as program counters, CWP, ASI, CCR, 
PSTATE, and the trap type) on a hardware register stack. 
2. Enter privileged execution mode with a predefined PSTATE. 


3. Begin executing trap handler code in the trap vector. 


When the trap handler has finished, it uses either a DONE or RETRY instruction to 
return. 


A trap may be caused by a Tcc instruction, an instruction-induced exception, a reset, 
an asynchronous error, or an interrupt request not directly related to a particular 
instruction. The virtual processor must appear to behave as though, before executing 
each instruction, it determines if there are any pending exceptions or interrupt 
requests. If there are pending exceptions or interrupt requests, the virtual processor 
selects the highest-priority exception or interrupt request and causes a trap. 
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Thus, an exception is a condition that makes it impossible for the virtual processor to 
continue executing the current instruction stream without software intervention. A 
trap is the action taken by the virtual processor when it changes the instruction flow 
in response to the presence of an exception, interrupt, reset, or Tcc instruction. 


V9 Compatibility | Exceptions referred to as “catastrophic error exceptions” in the 
Note | SPARC V9 specification do not exist in the UltraSPARC 
Architecture; they are handled using normal error-reporting 
exceptions. (impl. dep. #31-V8-Cs10) 


An interrupt is a request for service presented to a virtual processor by an external 
device. 


Traps are described in these sections: 


Virtual Processor Privilege Modes on page 424. 
Virtual Processor States and Traps on page 426. 
Trap Categories on page 426. 

Trap Control on page 431. 

Trap-Table Entry Addresses on page 432. 

Trap Processing on page 443. 

Exception and Interrupt Descriptions on page 445. 
Register Window Traps on page 450. 





12.1 Virtual Processor Privilege Modes 


An UltraSPARC Architecture virtual processor is always operating in a discrete 
privilege mode. The privilege modes are listed below in order of increasing 
privilege: 

m Nonprivileged mode (also known as “user mode") 


m Privileged mode, in which supervisor (operating system) software primarily 
operates 


m Hyperprivileged mode (not described in this document) 


The virtual processor’s operating mode is determined by the state of two mode bits, 
as shown in TABLE 12-1. 


TABLE 12-1 Virtual Processor Privilege Modes 





PSTATE.priv Virtual Processor Privilege Mode 





0 Nonprivileged 
1 Privileged 
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A trap is delivered to the virtual processor in either privileged mode or 
hyperprivileged mode; in which mode the trap is delivered depends on: 


m Its trap type 
m The trap level (TL) at the time the trap is taken 
m The privilege mode at the time the trap is taken 


Traps detected in nonprivileged and privileged mode can be delivered to the virtual 
processor in privileged mode or hyperprivileged mode. 


TABLE 12-4 on page 436 indicates in which mode each trap is processed, based on the 
privilege mode at which it was detected. 


A trap delivered to privileged mode uses the privileged-mode trap vector, based 
upon the TBA register. See Trap-Table Entry Address to Privileged Mode on page 433 for 
details. 


The maximum trap level at which privileged software may execute is MAXPTL 
(which, on a virtual processor, is 2).. 


Notes | Execution in nonprivileged mode with TL > 0 is an invalid 
condition that privileged software should never allow to occur. 


FIGURE 12-1 shows how a virtual processor transitions between privilege modes, 
excluding transitions that can occur due to direct software writes to PSTATE.priv. In 
this figure, indicates a "trap destined for privileged mode" and indicates a 
“trap destined for hyperprivileged mode". 


€ TL « MAXPTL (2) 


Nonprivileged Privileged Hyperprivileged 





1 if (TSTATE[TL].PSTATE.priv = 0) 2 if (TSTATE[TL].PSTATE.priv = 1) 


FIGURE 12-1 Virtual Processor Privilege Mode Transition Diagram 
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12.2 


Virtual Processor States and Traps 


The value of TL affects the generated trap vector address. TL also determines where 
(that is, into which element of the TSTATE array) the states are saved. 


12.2.0.1 Usage of Trap Levels 


If MAXPTL = 2 in an UltraSPARC Architecture implementation, the trap levels might 
be used as shown in TABLE 12-2. 


TABLE 12-2 Typical Usage for Trap Levels 


Corresponding 





TL Execution Mode Usage 
0 Nonprivileged Normal execution 
1 Privileged System calls; interrupt handlers; instruction emulation 
2 Privileged Window spill/fill handler 








12.3 


12.9.1 


Trap Categories 


An exception, error, or interrupt request can cause any of the following trap types: 


Precise trap 
Deferred trap 
Disrupting trap 
Reset trap 


Precise Traps 


A precise trap is induced by a particular instruction and occurs before any program- 

visible state has been changed by the trap-inducing instructions. When a precise trap 

occurs, several conditions must be true: 

m The PC saved in TPC[TL] points to the instruction that induced the trap and the 
NPC saved in TNPC[TL] points to the instruction that was to be executed next. 


m All instructions issued before the one that induced the trap have completed 
execution. 


m Any instructions issued after the one that induced the trap remain unexecuted. 


426 UltraSPARC Architecture 2005 + Draft DO.9.2, 19 Jun 2008 


12:3:2 


Among the actions that trap handler software might take when processing a precise 
trap are: 


m Return to the instruction that caused the trap and reexecute it by executing a 
RETRY instruction (PC < old PC, NPC + old NPC). 


m Emulate the instruction that caused the trap and return to the succeeding 
instruction by executing a DONE instruction (PC — old NPC, 
NPC < old NPC + 4). 


m Terminate the program or process associated with the trap. 


Deferred Traps 


A deferred trap is also induced by a particular instruction, but unlike a precise trap, a 
deferred trap may occur after program-visible state has been changed. Such state 
may have been changed by the execution of either the trap-inducing instruction 
itself or by one or more other instructions. 


There are two classes of deferred traps: 


m Termination deferred traps — The instruction (usually a store) that caused the trap 
has passed the retirement point of execution (the TPC has been updated to point 
to an instruction beyond the one that caused the trap). The trap condition is an 
error that prevents the instruction from completing and its results becoming 
globally visible. A termination deferred trap has high trap priority, second only to 
the priority of resets. 


Programming | Not enough state is saved for execution of the instruction stream 
Note | to resume with the instruction that caused the trap. Therefore, 
the trap handler must terminate the process containing the 
instruction that caused the trap. 


m Restartable deferred traps — The program-visible state has been changed by the 
trap-inducing instruction or by one or more other instructions after the trap- 
inducing instruction. 


SPARC V9 | A restartable deferred trap is the "deferred trap" defined in the 
Compatibility | SPARC V9 specification. 
Note 


The fundamental characteristic of a restartable deferred trap is that the state of the 
virtual processor on which the trap occurred may not be consistent with any precise 
point in the instruction sequence being executed on that virtual processor. When a 
restartable deferred trap occurs, TPC[TL] and TNPC[TL] contain a PC value and an 
NPC value, respectively, corresponding to a point in the instruction sequence being 
executed on the virtual processor. This PC may correspond to the trap-inducing 
instruction or it may correspond to an instruction following the trap-inducing 
instruction. With a restartable deferred trap, program-visible updates may be 
missing from instructions prior to the instruction to which TPC[TL] refers. The 
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missing updates are limited to instructions in the range from (and including) the 
actual trap-inducing instruction up to (but not including) the instruction to which 
TPC[TL] refers. By definition, the instruction to which TPC[TL] refers has not yet 
executed, therefore it cannot have any updates, missing or otherwise. 


With a restartable deferred trap there must exist sufficient information to report the 
error that caused the deferred trap. If system software can recover from the error 
that caused the deferred trap, then there must be sufficient information to generate a 
consistent state within the processor so that execution can resume. Included in that 
information must be an indication of the mode (nonprivileged, privileged, or 
hyperprivileged) in which the trap-inducing instruction was issued. 


How the information necessary for repairing the state to make it consistent state is 
maintained and how the state is repaired to a consistent state are implementation 
dependent. It is also implementation dependent whether execution resumes at the 
point of the trap-inducing instruction or at an arbitrary point between the trap- 
inducing instruction and the instruction pointed to by the TPC[TL], inclusively. 


Associated with a particular restartable deferred trap implementation, the following 
must exist: 


m An instruction that causes a potentially outstanding restartable deferred trap 
exception to be taken as a trap 


m Instructions with sufficient privilege to access the state information needed by 
software to emulate the restartable deferred trap-inducing instruction and to 
resume execution of the trapped instruction stream. 


Programming | Resuming execution may require the emulation of instructions 
Note | that had not completed execution at the time of the restartable 
deferred trap, that is, those instructions in the deferred-trap 
queue. 


Software should resume execution with the instruction starting at the instruction to 
which TPC[TL] refers. Hardware should provide enough information for software to 
recreate virtual processor state and update it to the point just before execution of the 
instruction to which TPC[TL] refers. After software has updated virtual processor 

state up to that point, it can then resume execution by issuing a RETRY instruction. 


IMPL. DEP. #32-V8-Ms10: Whether any restartable deferred traps (and, possibly, 
associated deferred-trap queues) are present is implementation dependent. 
Among the actions software can take after a restartable deferred trap are these: 


m Emulate the instruction that caused the exception, emulate or cause to execute 
any other execution-deferred instructions that were in an associated restartable 
deferred trap state queue, and use RETRY to return control to the instruction at 
which the deferred trap was invoked. 


m Terminate the program or process associated with the restartable deferred trap. 
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12.33 


A deferred trap (of either of the two classes) is always delivered to the virtual 
processor in hyperprivileged mode. 


Disrupting Traps 


12.3.3.1 Disrupting versus Precise and Deferred Traps 


A disrupting trap is caused by a condition (for example, an interrupt) rather than 
directly by a particular instruction. This distinguishes it from precise and deferred 
traps. 


When a disrupting trap has been serviced, trap handler software normally arranges 
for program execution to resume where it left off. This distinguishes disrupting traps 
from reset traps, since a reset trap vectors to a unique reset address and execution of 
the program that was running when the reset occurred is generally not expected to 
resume. 


When a disrupting trap occurs, the following conditions are true: 


1. The PC saved in TPC[TL] points to an instruction in the disrupted program 
stream and the NPC value saved in TNPC[TL] points to the instruction that was 
to be executed after that one. 


2. All instructions issued before the instruction indicated by TPC[TL] have 
retired. 


3. The instruction to which TPC[TL] refers and any instruction(s) that were 
issued after it remain unexecuted. 


A disrupting trap may be due to an interrupt request directly related to a 
previously-executed instruction; for example, when a previous instruction sets a bit 
in the SOFTINT register. 


12.3.3.2 Causes of Disrupting Traps 


A disrupting trap may occur due to either an interrupt request or an error not 
directly related to instruction processing. The source of an interrupt request may be 
either internal or external. An interrupt request can be induced by the assertion of a 
signal not directly related to any particular virtual processor or memory state, for 
example, the assertion of an “I/O done” signal. 


A condition that causes a disrupting trap persists until the condition is cleared. 


12.3.3.3 Conditioning of Disrupting Traps 


How disrupting traps are conditioned is affected by: 
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m The privilege mode in effect when the trap is outstanding, just before the trap is 
actually taken (regardless of the privilege mode that was in effect when the 
exception was detected). 


m The privilege mode for which delivery of the trap is destined 


Outstanding in Nonprivileged or Privileged mode, destined for delivery in 
Privileged mode. An outstanding disrupting trap condition in either 
nonprivileged mode or privileged mode and destined for delivery to privileged 
mode is held pending while the Interrupt Enable (ie) field of PSTATE is zero 
(PSTATE.ie = 0). interrupt level n interrupts are further conditioned by the Processor 
Interrupt Level (PIL) register. An interrupt is held pending while either 

PSTATE.ie = 0 or the condition’s interrupt level is less than or equal to the level 
specified in PIL. When delivery of this disrupting trap is enabled by PSTATE.ie = 1, 
it is delivered to the virtual processor in privileged mode if TL < MAXPTL (2, in 
UltraSPARC Architecture 2005 implementations). 


Outstanding in Nonprivileged or Privileged mode, destined for delivery in 
Hyperprivileged mode. An outstanding disrupting trap condition detected while 
in either nonprivileged mode or privileged mode and destined for delivery in 
hyperprivileged mode is never masked; it is delivered immediately. 


The above is summarized in TABLE 12-3. 
TABLE 12-3 Conditioning of Disrupting Traps 


Disposition of Disrupting Traps, based on privilege 


Type of Disrupting Current Virtual Processor mode in which the trap is destined to be delivered 


Trap Condition Privilege Mode Privileged Hyperprivileged 
Nonprivileged or Held pending while — — 
Interrupt_level_n Privileged PSTATE.ie = 0 or 


interrupt level < PIL 


All other disrupting] Nonprivileged or Held pending while ^ Delivered 
traps Privileged PSTATE.ie = 0 immediately 


12.3.3.4 Trap Handler Actions for Disrupting Traps 
Among the actions that trap-handler software might take to process a disrupting 
trap are: 


m Use RETRY to return to the instruction at which the trap was invoked 
(PC < old PC, NPC < old NPC). 


m Terminate the program or process associated with the trap. 
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12.3.4 


Uses of the Trap Categories 


The SPARC V9 trap model stipulates the following: 
1. Reset traps occur asynchronously to program execution. 


2. When recovery from an exception can affect the interpretation of subsequent 
instructions, such exceptions shall be precise. See TABLE 12-4, TABLE 12-5, and 
Exception and Interrupt Descriptions on page 445 for identification of which traps 
are precise. 


3. In an UltraSPARC Architecture implementation, all exceptions that occur as the 
result of program execution are precise (impl. dep. #33-V8-Cs10). 


4. An error detected after the initial access of a multiple-access load instruction (for 
example, LDTX or LDBLOCKF) should be precise. Thus, a trap due to the second 
memory access can occur. However, the processor state should not have been 
modified by the first access. 


5. Exceptions caused by external events unrelated to the instruction stream, such as 
interrupts, are disrupting. 


A deferred trap may occur one or more instructions after the trap-inducing 
instruction is dispatched. 





12.4 


Trap Control 


Several registers control how any given exception is processed, for example: 


m The interrupt enable (ie) field in PSTATE and the Processor Interrupt Level (PIL) 
register control interrupt processing. See Disrupting Traps on page 429 for details. 


m The enable floating-point unit (fef) field in FPRS, the floating-point unit enable 
(pef) field in PSTATE, and the trap enable mask (tem) in the FSR control floating- 
point traps. 


m The TL register, which contains the current level of trap nesting, affects whether 
the trap is processed in privileged mode or hyperprivileged mode. 


m PSTATE.tle determines whether implicit data accesses in the trap handler routine 
will be performed using big-endian or little-endian byte order. 


Between the execution of instructions, the virtual processor prioritizes the 
outstanding exceptions, errors, and interrupt requests. At any given time, only the 
highest-priority exception, error, or interrupt request is taken as a trap. When there 
are multiple interrupts outstanding, the interrupt with the highest interrupt level is 
selected. When there are multiple outstanding exceptions, errors, and/or interrupt 
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requests, a trap occurs based on the exception, error, or interrupt with the highest 
priority (numerically lowest priority number in TABLE 12-5). See Trap Priorities on 
page 442. 


12.4.1 PIL Control 


When an interrupt request occurs, the virtual processor compares its interrupt 
request level against the value in the Processor Interrupt Level (PIL) register. If the 
interrupt request level is greater than PIL and no higher-priority exception is 
outstanding, then the virtual processor takes a trap using the appropriate 
interrupt_level_n trap vector. 


12.4.2 FSR.tem Control 


The occurrence of floating-point traps of type IEEE_754_exception can be controlled 
with the user-accessible trap enable mask (tem) field of the FSR. If a particular bit of 
FSR.tem is 1, the associated IEEE 754 exception can cause an 

fp exception ieee 754 trap. 


If a particular bit of FSR.tem is 0, the associated IEEE 754 exception does not cause 
an fp exception ieee 754 trap. Instead, the occurrence of the exception is recorded 
in the FSR's accrued exception field (aexc). 


If an IEEE 754 exception results in an fp. exception ieee 754 trap, then the 
destination F register, FSR.fccn, and FSR.aexc fields remain unchanged. However, 
if an IEEE 754 exception does not result in a trap, then the F register, FSR.fccr, and 
FSR.aexc fields are updated to their new values. 





12.5  Trap-Table Entry Addresses 


Traps are delivered to the virtual processor in either privileged mode or 
hyperprivileged mode, depending on the trap type, the value of TL at the time the 
trap is taken, and the privilege mode at the time the exception was detected. See 
TABLE 12-4 on page 436 and TABLE 12-5 on page 440 for details. 


Unique trap table base addresses are provided for traps being delivered in 
privileged mode and in hyperprivileged mode. 
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12.5.41 Trap-Table Entry Address to Privileged Mode 


Privileged software initializes bits 63:15 of the Trap Base Address (TBA) register (its 
most significant 49 bits) with bits 63:15 of the desired 64-bit privileged trap-table 
base address. 


At the time a trap to privileged mode is taken: 

m Bits 63:15 of the trap vector address are taken from TBA(63:15]. 

m Bit 14 of the trap vector address (the "TL » 0" field) is set based on the value of TL 
just before the trap is taken; that is, if TL = 0 then bit 14 is set to 0 and if TL» 0 
then bit 14 is set to 1. 

m Bits 13:5 of the trap vector address contain a copy of the contents of the TT 
register (TT[TL]). 

m Bits 4:0 of the trap vector address are always 0; hence, each trap table entry is at 
least 2? or 32 bytes long. Each entry in the trap table may contain the first eight 
instructions of the corresponding trap handler. 


FIGURE 12-2 illustrates the trap vector address for a trap delivered to privileged 
mode. In FIGURE 122, the “TL>0” bit is 0 if TL = 0 when the trap was taken, and 1 if 
TL » 0 when the trap was taken. This implies, as detailed in the following section, 
that there are two trap tables for traps to privileged mode: one for traps from TL = 0 
and one for traps from TL > 0. 


from TBA{63:15} (TBA.tba_high49) TT[TL]| 0 0000 
3 5 4 0 


63 15 14 1 


FIGURE 12-2 Privileged Mode Trap Vector Address 
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1255.2 


Privileged Trap Table Organization 


The layout of the privileged-mode trap table (which is accessed using virtual 
addresses) is illustrated in FIGURE 12-3. 














Value Software Hardware Trap Table 
of TL Trap Trap Type Offset 
(before Type (TT[TL]) (from TBA) Contents of Trap Table 
trap) 
— 00046—07F16 016- FEO,g | Hardware traps 
TL 0 = 080;6-0FF;6 100046-1 FE016 Spill / fill traps 
3 016- 7F46 10046-17F1g 2000;6-2FE0;6 | Software traps to Privileged level 
= 18016-1 FFig 3000,6-3FE0,6 unassigned 
= 000;g-07F;g 400016-4FE0416 | Hardware traps 
TL 21 — 080:6-0FF16 5000416-5FE046 | Spill / fill traps 
(TL = Oie-7Fig  1004g-17F yg ^ 6000,5-6FEO,g | Software traps to Privileged level 
MAXPTL-1) 


12.5.3 





unassigned 


180,g-1FF4g  700016-7FE06 


FIGURE 12-3 Privileged-mode Trap Table Layout 


The trap table for TL = 0 comprises 512 thirty-two-byte entries; the trap table for 
TL > 0 comprises 512 more thirty-two-byte entries. Therefore, the total size of a full 
privileged trap table is 2 x 512 x 32 bytes (32 Kbytes). However, if privileged 
software does not use software traps (Tcc instructions) at TL > 0, the table can be 
made 24 Kbytes long. 


Trap Type (TT) 


When a normal trap occurs, a value that uniquely identifies the type of the trap is 

written into the current 9-bit TT register (TT[TL]) by hardware. Control is then 

transferred into the trap table to an address formed by the trap’s destination 

privilege mode: 

m The TBA register, (TL > 0), and TT[TL] (see Trap-Table Entry Address to Privileged 
Mode on page 433) 


TT values 000416—0FF46 are reserved for hardware traps. TT values 10045,-17F46 are 
reserved for software traps (caused by execution of a Tcc instruction) to privileged- 
mode trap handlers. 


IMPL. DEP. #35-V8-Cs20: TT values 06046 to 07F16 were reserved for 
implementation dependent exception n exceptions in the SPARC V9 specification, 
but are now all defined as standard UltraSPARC Architecture exceptions. See 
TABLE 12-4 for details. 
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The assignment of TT values to traps is shown in TABLE 12-4; TABLE 12-5 provides the 
same list, but sorted in order of trap priority. The key to both tables follows: 





Symbol Meaning 





e This trap type is associated with a feature that is architecturally required in an 
implementation of UltraSPARC Architecture 2005. Hardware must detect this 
exception or interrupt, trap on it (if not masked), and set the specified trap type 
value in the TT register. 


O This trap type is associated with a feature that is architecturally defined in 
UltraSPARC Architecture 2005, but its implementation is optional. 


P Trap is taken via the Privileged trap table, in Privileged mode (PSTATE.priv = 1) 
H Trap is taken in Hyperprivileged mode 


-X- Not possible. Hardware cannot generate this trap in the indicated running mode. 
For example, all privileged instructions can be executed in privileged mode, 
therefore a privileged_opcode trap cannot occur in privileged mode. 


— This trap is reserved for future use. 


(ie) When the outstanding disrupting trap condition occurs in this privilege mode, it 
may be conditioned (masked out) by PSTATE.ie = 0 (but remains pending). 


(nm) Never Masked — when the condition occurs in this running mode, it is never 
masked out and the trap is always taken. 


(pend) Held Pending — the condition can occur in this running mode, but can’t be 
serviced in this mode. Therefore, it is held pending until the mode changes to 
one in which the exception can be serviced. 
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TABLE 12-4 Exception and Interrupt Requests, by TT Value (1 of 4) 


Mode in which Trap is 
Delivered (and 
Conditioning Applied), 

















Priority based on Current 

UA-2005 TT (0 = Privilege Mode 

@=Reqd. (Trap Trap High- 

OzOptl Exception or Interrupt Request Type) Category est) NP Priv 
— Reserved 00016 m — — — 
e (used at higher privilege levels) 00116- — — — — 

00516 
— Reserved 00516 = — — Eu: 
— implementation-dependent 00616 — = — = 
e instruction access exception 00816 precise 3 H H 
e (used at higher privilege levels) 00916 — — — = 
e (used at higher privilege levels) 00A16 = cae — — 
— Reserved 00B46- — — — = 
00D 46 
— Reserved 00D46- — — — - 
00E16 
(used at higher privilege levels) 00D16 — — — = 
(used at higher privilege levels) 00E16 = cu S — 
Reserved OOF 16 = — — = 
e illegal instruction 01016 precise 6.2 H H 
e privileged opcode 01146 precise 7 P -x- 
(nm) 
— Reserved 01216- = E — = 
01316 
— Reserved 014B16- — EE — = 
01716 
— Reserved 01816- — — m — 
01F 16 
e fo disabled 02016 precise 8 P P 
(nm) (nm) 
O fp exception ieee 754 02116 precise 11.1 P P 
(nm) (nm) 
O fo exception other 02216 precise 11.1 P P 
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TABLE 12-4 Exception and Interrupt Requests, by TT Value (2 of 4) 





Mode in which Trap is 
Delivered (and 
Conditioning Applied), 


























Priority based on Current 
UA-2005 TT (0 = Privilege Mode 
@=Reqd. (Trap Trap High- 
OzOptl Exception or Interrupt Request Type) Category est) NP Priv 
e tag overflow? 02316 precise 14 P P 
(nm) (nm) 
e clean window 02416? precise 10.1 P P 
(nm) (nm) 
— Reserved 02516- — — = — 
02716 
e division by zero 02816 precise 15 P P 
(nm) (nm) 
— Reserved 02C16 — — — — 
— Reserved 02D16- — = — — 
02F 16 
e data access exception 03016 precise 12.01 H H 
— Reserved 03216 — — — — 
e mem adaress not aligned 03416 precise 10.2 H H 
e LDDF mem adaress not aligned 03516 precise 10.1 H H 
e STDF mem adaress not aligned 03616 precise 10.1 H H 
e privileged action 03716 precise 11.1 H H 
O LDQF_mem_address_not_aligned 03816 precise 10.1 H H 
O STQF_mem_address_not_aligned 03946 precise 10.1 H H 
— Reserved 03A16 — = — — 
— Reserved 03B16 — — — — 
— Reserved 03B46- — — — — 
03D16 
— Reserved 04016 E = = — 
e interrupt level n (n = 1-15) 041,6- disrupting | 32-n P P 
04F 16 (31 to (ie) (ie) 
17) 
— Reserved 05016- — = = = 
05D16 
e (used at higher privilege levels) 05F16- — — — ES 
06116 


CHAPTER 12 * Traps 437 


TABLE 12-4 Exception and Interrupt Requests, by TT Value (3 of 4) 





UA-2005 
@=Req’d. 
OzOptl 





Exception or Interrupt Request 
Reserved 
Reserved 


VA watchpoint 


(used at higher privilege levels) 


Reserved 


implementation dependent exception n 
(impl. dep. #35-V8-Cs20) 


implementation_dependent_exception_n 
(impl. dep. #35-V8-Cs20) 


implementation_dependent_exception_n 
(impl. dep. #35-V8-Cs20) 


Reserved 


cpu_mondo 


dev_mondo 


resumable_error 


implementation_dependent_exception_15 
(impl. dep. #35-V8-Cs20) 


nonresumable_error 


spill_n_normal (n = 0-7) 


(reserved for use by spill 7 normal; 
see footnote for trap type 09C16) 


Spill n other (n = 0-7) 


(reserved for use by spill 7 other 
see footnote for trap type 0BC16) 


Mode in which Trap is 
Delivered (and 
Conditioning Applied), 











Priority based on Current 
TT (0 = Privilege Mode 

(Trap Trap High- 

Type) Category est) NP Priv 
06046 — — — — 
06216 — — — — 
06216 precise 11.2 P P 

(nm) (nm) 
06316- — — — — 
06C16 
06D16- — — — — 
06F 16 
07016- — V = — 
07516 
077 — V = — 
07916- — V — — 
07B16 
07916 — — — — 
07C16 disrupting 16.08 P P 
(ie) (ie) 
07D 16 disrupting 16.11 P P 
(ie) (ie) 
07E16 disrupting 33.3 P P 
(ie) (ie) 
07F 16 = v = sem 
07F16 — — — — 
080: ¢!- precise 9 P P 
09C46* (nm) (nm) 
09D16- — — — — 
09F16 
0A045t- precise 9 P P 
0BC46* (nm) (nm) 
0BD16- — — = = 
OBF 46 
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TABLE 12-4 Exception and Interrupt Requests, by TT Value (4 of 4) 





UA-2005 
@=Req'd. 
OzOptl 


Exception or Interrupt Request 


fill n normal (n = 0-7) 


(reserved for use by fill 7 normal; 
see footnote for trap type ODC6) 


fill n other (n = 0-7) 
(reserved for use by fill 7 other 
see footnote for trap type OFC16) 


trap instruction 


htrap instruction 


TT 
(Trap 
Type) 


0C04 6 
ODCy6t 


ODD y6- 
ODF 46 


OEO 


16} 


OFC, 6t 


0FD;c- 


OFF 


10016- 


17F 


180 
1FF 


6 





6- 
16 


Trap 
Category 


precise 


precise 


precise 


precise 


Mode in which Trap is 
Delivered (and 
Conditioning Applied), 





Priority based on Current 
(0 = Privilege Mode 
High- 
est) NP Priv 
9 P P 


(nm) (nm) 


(nm) (nm) 


* Although these trap priorities are recommended, all trap priorities are implementation dependent (impl. dep. #36-V8 on page 


442), including relative priorities within a given priority level. 


t The trap vector entry (32 bytes) for this trap type plus the next three trap types (total of 128 bytes) are permanently reserved for 
this exception. 


V The priority of an implementation dependent exception n trap is implementation dependent (impl. dep. # 35-V8-Cs20) 
P This exception is deprecated, because the only instructions that can generate it have been deprecated. 
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TABLE 12-5 Exception and Interrupt Requests, by Priority (1 of 2) 


UA-2005 


@=Req’d. 


OzOptl 
[].21mpl- 
Specific 
e 
e 


O O © e 
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Exception or Interrupt Request 
instruction_access_exception 
illegal_instruction 


privileged opcode 


fp disabled 


spill n normal (n = 0-7) 


Spill n other (n = 0-7) 


fill n normal (n = 0-7) 


fill n other (n = 0-7) 


clean window 





LDDF. mem adaress not aligned 
STDF mem adaress not aligned 





LDQF mem adaress not aligned 





STQF mem adaress not aligned 





mem address not aligned 


fp exception other 


fp exception ieee 754 


privileged action 


VA watchpoint 





Trap 
Category 


precise 
precise 


precise 


precise 


precise 


precise 


precise 


precise 


precise 


precise 
precise 
precise 


precise 


precise 


precise 


precise 


precise 


precise 


Priority 
(0- 
High- 
est) 


9 


10.2 


11.1 


11.2 


Mode in which Trap is 
Delivered and (and 
Conditioning Applied), 





based on Current 
Privilege Mode 
NP Priv 
H H 
H H 
P -X- 
(nm) 
P P 
(nm) (nm) 
P P 
(nm) (nm) 
P P 
(nm) (nm) 
P P 
(nm) (nm) 
P P 


P P 
(nm) (nm) 
H H 
H H 
H H 
H H 
H H 
P P 
(nm) (nm) 
P P 
(nm) (nm) 
H H 
P P 


(nm) (nm) 


TABLE 12-5 Exception and Interrupt Requests, by Priority (2 of 2) 





Mode in which Trap is 
Delivered and (and 








UA-2005 . | Conditioning Applied), 
@=Req’d. Priority based on Current 
OzOptl TT (0 = Privilege Mode 
[.].-1Impl- (Trap Trap High- 
Specific Exception or Interrupt Request Type) Category est) NP Priv 
e data access exception 03016 precise 12.01 H H 
e tag overflow? 02346 precise 14 P P 
(nm) (nm) 
e division by zero 02816 precise 15 P P 
(nm) (nm) 
e trap instruction 10016- precise P P 
17F16 (nm) (nm) 
16.02 
e htrap instruction 18016- precise -X- 
1FF16 
e cpu mondo 07C16 disrupting 16.08 P P 
(ie) (ie) 
e dev mondo 07D16 disrupting 16.111 P P 
(ie) (ie) 
e interrupt level n (n = 1-15) 04116 disrupting  32-n P P 
04F 16 (31 to (ie) (ie) 
17) 
e resumable error 07E16 disrupting 33.3 P P 
(ie) (ie) 
O implementation dependent exception n 07016 — — V — — 
(impl. dep. #35-V8-Cs20) 07546, 
07716, 
07916 - 
07B16; 
07F16 
— nonresumable error 07F16 


" Although these trap priorities are recommended, all trap priorities are implementation dependent (impl. dep. #36-V8 on 
page 442), including relative priorities within a given priority level. 


T The trap vector entry (32 bytes) for this trap type plus the next three trap types (total of 128 bytes) are permanently reserved 
for this exception. 


V The priority of an implementation dependent exception n trap is implementation dependent (impl. dep. # 35-V8-Cs20) 





P This exception is deprecated, because the only instructions that can generate it have been deprecated. 
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12.5.4 


12.5.3.1 Trap Type for Spi II/Fill Traps 


The trap type for window spill/fill traps is determined on the basis of the contents of 
the OTHERWIN and WSTATE registers as described below and shown in FIGURE 12-4. 


Bit Field Description 

8:6 spill or fill 010» for spill traps; 011» for fill traps 

5 other (OTHERWIN z 0) 

4:2 wtype If (other) then WSTATE.other; else WSTATE.normal 





8 6 5 4 2 1 0 


FIGURE 12-4 Trap Type Encoding for Spill/Fill Traps 


lrap Priorities 


TABLE 12-4 on page 436 and TABLE 12-5 on page 440 show the assignment of traps to 
TT values and the relative priority of traps and interrupt requests. A trap priority is 
an ordinal number, with 0 indicating the highest priority and greater priority 
numbers indicating decreasing priority; that is, if x « y, a pending exception or 
interrupt request with priority x is taken instead of a pending exception or interrupt 
request with priority y. Traps within the same priority class (0 to 33) are listed in 
priority order in TABLE 12-5 (impl. dep. #36-V8). 


IMPL. DEP. #36-V8: The relative priorities of traps defined in the UltraSPARC 
Architecture are fixed. However, the absolute priorities of those traps are 
implementation dependent (because a future version of the architecture may define 
new traps). The priorities (both absolute and relative) of any new traps are 
implementation dependent. 


However, the TT values for the exceptions and interrupt requests shown in 
TABLE 12-4 and TABLE 12-5 must remain the same for every implementation. 


The trap priorities given above always need to be considered within the context of 
how the virtual processor actually issues and executes instructions. 
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12.6 


12.6.1 


Trap Processing 


The virtual processor’s action during trap processing depends on various virtual 
processor states, including the trap type, the current level of trap nesting (given in 
the TL register), and PSTATE. When a trap occurs, the GL register is normally 
incremented by one (described later in this section), which replaces the set of eight 
global registers with the next consecutive set. 





During normal operation, the virtual processor is in execute_state. It processes 
traps in execute_state and continues. 





TABLE 12-6 describes the virtual processor mode and trap-level transitions involved 
in handling traps. 





TABLE 12-6 Trap Received While in execute_stat 


Original State New State, After Receiving Trap 
or Interrupt 





execute_state execute_state 
TL < MAXPTL — 1 TL — TL «1 
Normal Trap Processing 


A trap is delivered in either privileged mode or hyperprivileged mode, depending 
on the type of trap, the trap level (TL), and the privilege mode in effect when the 
exception was detected. 


During normal trap processing, the following state changes occur (conceptually, in 
this order): 


m The trap level is updated. This provides access to a fresh set of privileged trap- 
state registers used to save the current state, in effect, pushing a frame on the trap 
stack. 


TL — TL «1 
m Existing state is preserved. 


m TSTATE[TL].gl — GL 
TSTATE[TL].ccr «- CCR 
TSTATE[TL].asi + ASI 
TSTATE[TL].pstate — PSTATE 
TSTATE[TL].cwp «— CWP 
TPC[TL] — PC // (upper 32 bits zeroed if PSTATE.am = 1) 
TNPC[TL] — NPC // (upper 32 bits zeroed if PSTATE.am = 1) The trap type is 
preserved. 


TT[TL] < the trap type 


ne pru 
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m The Global Level register (GL) is updated. This normally provides access to a 
fresh set of global registers: 


GL < min (GL + 1, MAXPGL) 


m The PSTATE register is updated to a predefined state: 
PSTATE.mm is unchanged 
PSTATE.pef | «— 1 // if an FPU is present, it is enabled 
PSTATE.am < 0 // address masking is turned offPSTATE.priv — 1 // 
the virtual processor enters privileged mode 
PSTATE.cle + PSTATE.tle //set endian mode for traps 
endif 
PSTATE.ie < 0 // interrupts are disabled 
PSTATE.tle is unchanged 
PSTATE.tct < 0 // trap on CTI disabled 


m For a register-window trap (clean window, window spill, or window fill) only, 
CWP is set to point to the register window that must be accessed by the trap- 
handler software, that is: 


if TT[TL] = 02446 // a clean window trap 
then CWP < CWP +1 
endif 


if (08016 € TT[TL] < OBF36) // window spill trap 
then CWP < CWP + CANSAVE + 2 
endif 


if (0C045 < TT[TL] < OFF49) // window fill trap 
then CWP —— CWP -1 
endif 


For non-register-window traps, CWP is not changed. 


m Control is transferred into the trap table: 


/ / Note that at this point, TL has already been incremented (above) 
if ( (trap is to privileged mode) and (TL < MAXPTL) ) 
then 
/ /the trap is handled in privileged mode 
/ / Note: The expression “(TL > 1)" below evaluates to the 
/ / value 05 if TL was 0 just before the trap (in which 
/ / case, TL = 1 now, since it was incremented above, 
/ / during trap entry). "(TL > 1)" evaluates to 1, if 
/ / TL was » 0 before the trap. 
PC — TBA{63:15} :: (TL > 1) :: TT[TL] :: 0 00005 
NPC < TBA{63:15} :: (TL > 1) :: TT[TL] :: 0 0100, 
else (trap is handled in hyperprivileged mode } 
endif 


444 UltraSPARC Architecture 2005 * Draft DO.9.2, 19 Jun 2008 


Interrupts are ignored as long as PSTATE.ie = 0. 


Programming | State in TPC[n], TNPC[n], TSTATE[n], and TT[r] is only changed 
Note | autonomously by the processor when a trap is taken while 
TL = n -1; however, software can change any of these values 
with a WRPR instruction when TL = n. 





12.7 


Exception and Interrupt Descriptions 


The following sections describe the various exceptions and interrupt requests and 
the conditions that cause them. Each exception and interrupt request describes the 
corresponding trap type as defined by the trap model. 


All other trap types are reserved. 


Note | The encoding of trap types in the UltraSPARC Architecture 
differs from that shown in The SPARC Architecture Manual- 
Version 9. Each trap is marked as precise, deferred, disrupting, or 
reset. Example exception conditions are included for each 
exception type. Chapter 7, Instructions, enumerates which traps 
can be generated by each instruction. 


The following traps are generally expected to be supported in all UltraSPARC 
Architecture 2005 implementations. A given trap is not required to be supported in 
an implementation in which the conditions that cause the trap can never occur. 


m clean window [TT = 02445-02716] (Precise) — A SAVE instruction discovered 
that the window about to be used contains data from another address space; the 
window must be cleaned before it can be used. 


IMPL. DEP. #102-V9: An implementation may choose either to implement 
automatic cleaning of register windows in hardware or to generate a 

clean window trap, when needed, so that window(s) can be cleaned by software. 
If an implementation chooses the latter option, then support for this trap type is 
mandatory. 


a cpu mondo [TT = 07C;4] (Disrupting) — This interrupt is generated when 
another virtual processor has enqueued a message for this virtual processor. It is 
used to deliver a trap in privileged mode, to inform privileged software that an 
interrupt report has been appended to the virtual processor's CPU mondo queue. 
A direct message between virtual processors is sent via a CPU mondo interrupt. 
When the CPU mondo queue has a valid entry, a cpu mondo exception is sent to 
the target virtual processor. 


m data access exception [TT = 03045] (Precise) — An exception occurred on an 
attempted data access. 


The conditions that may cause a dala access exception exception are: 
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Privilege Violation — An attempt to access a privileged page (TTE.p = 1) by 
any type of load, store, or load-store instruction when executing in 
nonprivileged mode (PSTATE.priv = 0). This includes the special case of an 
access by privileged software using one of the 
ASI_AS_IF_USER_PRIMARY[_LITTLE] or 

ASI AS IF USER SECONDARY[ LITTLE] ASls. 




















Illegal Access to Noncacheable Page — An access to a noncacheable page 
(TTE.cp = 0) was attempted by an atomic load-store instruction (CASA, 
CASXA, SWAP, SWAPA, LDSTUB, or LDSTUBA) or an LDTXA instruction. 


Illegal Access to Page That May Cause Side Effects — An attempt was made 
to access a page which may cause side effects (TTE.e = 1) by any type of load 
instruction with nonfaulting ASI. 


Invalid ASI — An attempt was made to execute an invalid combination of 
instruction and ASI. See the instruction descriptions in Chapter 7 for a detailed 
list of valid ASIs for each instruction that can access alternate address spaces. 
The following invalid combinations of instruction, ASI, and virtual address 
cause a data access exception exception: 


= A load, store, load-store, or PREFETCHA instruction with either an invalid 
ASI or an invalid virtual address for a valid ASI. 

a A disallowed combination of instruction and ASI (see Block Load and Store 
ASIs on page 415 and Partial Store ASIs on page 416). This includes the 
following: 

a An attempt to use a Load Twin Extended Word (LDTXA) ASI (see ASIs 1016, 
1116, 1616, 1716 and 1816 (ASL *AS IF USER *) on page 409) with any load 
alternate opcode other than LDTXA’s (which is shared by LDTWA) 


a An attempt to use a nontranslating ASI value with any load or store alternate 
instruction other than LDXA, LDDFA, STXA, or STDFA 


a An attempt to read from a write-only ASl-accessible register 





a An attempt to write to a read-only ASl-accessible register 


Illegal Access to Non-Faulting-Only Page — An attempt was made to access a 
non-faulting-only page (TTE.nfo = 1) by any type of load, store, or load-store 
instruction with an ASI other than a nonfaulting ASI 
(PRIMARY_NO_FAULT[_LITTLE] or SECONDARY_NO_FAULT[_LITTLE]). 

















Forward | The next revision of the UltraSPARC Architecture is expected to 
Compatibility | replace data_access_exception with several more specific 
Note | exceptions — one for each condition that currently can cause a 
data_access_exception. This will support slightly faster trap 
handling for these exceptions. 


a dev_mondo [TT = 07D 4.6] (Disrupting) — This interrupt causes a trap to be 
delivered in privileged mode, to inform privileged software that an interrupt 
report has been appended to its device mondo queue. When a virtual processor 
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has appended a valid entry to a target virtual processor’s device mondo queue, it 
sends a dev_mondo exception to the target virtual processor. The interrupt report 
contents are device specific. 


division by. zero [TT = 02845] (Precise) — An integer divide instruction 
attempted to divide by zero. 


fill » normal [TT = 0C0:,-0DF;4] (Precise) 
fill n other [TT = 0E0:,-0FF;6] (Precise) 


A RESTORE or RETURN instruction has determined that the contents of a 
register window must be restored from memory. 


fp disabled [TT = 02045] (Precise) — An attempt was made to execute an FPop, a 
floating-point branch, or a floating-point load /store instruction while an FPU was 
disabled (PSTATE.pef = 0 or FPRS.fef = 0). 


fp exception ieee 754 [TT = 021,6] (Precise) — An FPop instruction generated 
an IEEE 754 exception and its corresponding trap enable mask (FSR.tem) bit was 
1. The floating-point exception type, IEEE 754 exception, is encoded in the 
FSRftt, and specific IEEE 754 exception information is encoded in FSR.cexc. 


fp exception other [TT = 02246] (Precise) — An FPop instruction generated an 
exception other than an IEEE 754 exception. Examples: the FPop is 
unimplemented or execution of an FPop requires software assistance to complete. 
The floating-point exception type is encoded in FSR.ftt. 


htrap instruction [TT = 18015-1FF46] (Precise) — A Tcc instruction was executed 
in privileged mode, the trap condition evaluated to TRUE, and the software trap 
number was greater than 127. The trap is delivered in hyperprivileged mode. See 
also trap instruction on page 449. 





illegal instruction [TT = 01046] (Precise) — An attempt was made to execute an 
ILLTRAP instruction, an instruction with an unimplemented opcode, an 
instruction with invalid field usage, or an instruction that would result in illegal 
processor state. 


Note | An unimplemented FPop instruction generates an 
fp exception other exception with ftt = 3, instead of an 
illegal instruction exception. 


Examples of cases in which illegal instruction is generated include the following: 


a An instruction encoding does not match any of the opcode map definitions (see 
Appendix A, Opcode Maps). 


» A non-FPop instruction is not implemented in hardware. 
m A reserved instruction field in Tcc instruction is nonzero. 


If a reserved instruction field in an instruction other than Tcc is nonzero, an 
illegal instruction exception should be, but is not required to be, generated. 
(See Reserved Opcodes and Instruction Fields on page 120.) 


a An illegal value is present in an instruction i field. 
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a An illegal value is present in a field that is explicitly defined for an instruction, 
such as cc2, cc1, cc0, fcn, impl, op2 (IMPDEP2A, IMPDEP2B), rcond, or opf cc. 


a Illegal register alignment (such as odd rd value in a doubleword load 
instruction). 


a Illegal rd value for LDXFSR, STXFSR, or the deprecated instructions LDFSR or 
STFSR. 


a ILLTRAP instruction. 
a DONE or RETRY when TL = 0. 


All causes of an illegal instruction exception are described in individual 
instruction descriptions in Chapter 7, Instructions. 


m instruction access exception [TT = 008,6] (Precise) — An exception occurred 
on an instruction access. The conditions that may cause an 
instruction access exception exception are: 

a Privilege Violation — An attempt to fetch an instruction from a privileged 
memory page (TTE.p = 1) while the virtual processor was executing in 
nonprivileged mode. 

» Unauthorized Access — An attempt to fetch an instruction from a memory 
page which was missing "execute" permission (TTE.ep - 0). 

a No-Fault Only Access — An attempt to fetch an instruction from a memory 
page which was marked for access only by nonfaulting loads (TTE.nfo = 1). 


m interrupt level n [TT = 041;6-04F16] (Disrupting) — SOFTINT{n} was set to 1 or 
an external interrupt request of level n was presented to the virtual processor and 
n> PIL. 

Implementation | interrupt_level_14 can be caused by (1) setting SOFTINT{14} 
Note | to 1, (2) occurrence of a "TICK match", or (3) occurrence of a 
"STICK match" (see SOFTINT? Register (ASRs 20, 21, 22) on 

page 77). 


m LDDF mem address not aligned [TT = 03546] (Precise) — An attempt was 
made to execute an LDDF or LDDFA instruction and the effective address was not 
doubleword aligned. (impl. dep. #109) 


m mem address not aligned [TT = 034;¢] (Precise) — A load/store instruction 
generated a memory address that was not properly aligned according to the 
instruction, or a JMPL or RETURN instruction generated a non-word-aligned 
address. (See also Special Memory Access ASIs on page 409.) 


a nonresumable error [TT = 07F;6] (Disrupting) — There is a valid entry in the 
nonresumable error queue. This interrupt is not generated by hardware, but is 
used by hyperprivileged software to inform privileged software that an error 
report has been appended to the nonresumable error queue. 


m privileged action [TT = 03746] (Precise) — An action defined to be privileged has 
been attempted while in nonprivileged mode (PSTATE.priv = 0), or an action 
defined to be hyperprivileged has been attempted while in nonprivileged or 
privileged mode. Examples: 
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a A data access by nonprivileged software using a restricted (privileged or 
hyperprivileged) ASI, that is, an ASI in the range 0046 to 7Fy¢ (inclusively) 

a A data access by nonprivileged or privileged software using a hyperprivileged 
ASI, that is, an ASI in the range 3046 to 7F16 (inclusively) 

a Execution by nonprivileged software of an instruction with a privileged 
operand value 

a An attempt to read the TICK register by nonprivileged software when 
nonprivileged access to TICK is disabled (TICK.npt = 1). 

w An attempt to access the PIC register (using RDPIC or WRPIC) while in 
nonprivileged mode (PSTATE.priv = 0) and nonprivileged access to PIC is 
disallowed (PCR.priv = 1). 

w An attempt to execute a nonprivileged instruction with an operand value 
requiring more privilege than available in the current privilege mode. 


privileged opcode [TT = 011;4] (Precise) — An attempt was made to execute a 
privileged instruction while PSTATE.priv - 0. 


resumable error [TT = 07E;¢] (Disrupting) — There is a valid entry in the 
resumable error queue. This interrupt is used to inform privileged software that 
an error report has been appended to the resumable error queue, and the current 
instruction stream is in a consistent state so that execution can be resumed after 
the error is handled. 


spill 1 normal [TT = 08015—09F;g] (Precise) 
spill n other [TT = 0A0;,,-0BF;6] (Precise) 


A SAVE or FLUSHW instruction has determined that the contents of a register 
window must be saved to memory. 


STDF mem address not aligned [TT = 03645] (Precise) — An attempt was 
made to execute an STDF or STDFA instruction and the effective address was not 
doubleword aligned. (impl. dep. #110) 

tag overflow [TT = 02346] (Precise) (deprecated (c2) ) — A TADDccTV or 
TSUBccTV instruction was executed, and either 32-bit arithmetic overflow 
occurred or at least one of the tag bits of the operands was nonzero. 

trap instruction [TT = 10015—17F46] (Precise) — A Tcc instruction was executed 
and the trap condition evaluated to TRUE, and the software trap number operand 
of the instruction is 127 or less. 





unimplemented LDTW [TT = 01245] (Precise) — An attempt was made to execute 
an LDTW instruction that is not implemented in hardware on this 
implementation (impl. dep. #107-V9). 

unimplemented STTW [TT = 01346] (Precise) — An attempt was made to execute 
an STTW instruction that is not implemented in hardware on this implementation 
(impl. dep. #108-V9). 

VA watchpoint [TT = 06216] (Precise) — The virtual processor has detected an 
attempt to access a virtual address specified by the VA Watchpoint register, while 
VA watchpoints are enabled and the address is being translated from a virtual 
address to a hardware address. If the load or store address is not being translated 
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from a virtual address (for example, the address is being treated as a real 
address), then a VA watchpoint exception will not be generated even if a match is 
detected between the VA Watchpoint register and a load or store address. 


I2 SPARC V9 Traps Not Used in UltraSPARC 
Architecture 2005 


The following traps were optional in the SPARC V9 specification and are not used in 
UltraSPARC Architecture 2005: 


m implementation dependent exception n [TT = 07746 - 07A;6] This range of 
implementation-dependent exceptions has been replaced by a set of 
architecturally-defined exceptions. (impl.dep. #35-V8-Cs20) 


m LDQF mem address not aligned [TT = 03846] (Precise) — An attempt was 
made to execute an LDQF instruction and the effective address was word aligned 
but not quadword aligned. Use of this exception is implementation dependent 
(impl. dep. #111-V9-Cs10). A separate trap entry for this exception supports fast 
software emulation of the LDQF instruction when the effective address is word 
aligned but not quadword aligned. See Load Floating-Point Register on page 236. 
(impl. dep. #111) 


m STQF mem address not aligned [TT = 03946] (Precise) — An attempt was 
made to execute an STOF instruction and the effective address was word aligned 
but not quadword aligned. Use of this exception is implementation dependent 
(impl. dep. #112-V9-Cs10). A separate trap entry for the exception supports fast 
software emulation of the STQF instruction when the effective address is word 
aligned but not quadword aligned. See Store Floating-Point on page 321. (impl. dep. 
#112) 





12.8 Register Window Traps 


Window traps are used to manage overflow and underflow conditions in the register 
windows, support clean windows, and implement the FLUSHW instruction. 


12.8.1 Window Spill and Fill Traps 


A window overflow occurs when a SAVE instruction is executed and the next 
register window is occupied (CANSAVE = 0). An overflow causes a spill trap that 
allows privileged software to save the occupied register window in memory, thereby 
making it available for use. 
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12.8.2 


12.8.3 


A window underflow occurs when a RESTORE instruction is executed and the 
previous register window is not valid (CANRESTORE = 0). An underflow causes a 
fill trap that allows privileged software to load the registers from memory. 


clean_window Trap 


The virtual processor provides the clean_window trap so that system software can 
create a secure environment in which it is guaranteed that data cannot inadvertently 
leak through register windows from one software program to another. 


A clean register window is one in which all of the registers, including uninitialized 
registers, contain either 0 or data assigned by software executing in the address 
space to which the window belongs. A clean window cannot contain register values 
from another process, that is, from software operating in a different address space. 


Supervisor software specifies the number of windows that are clean with respect to 
the current address space in the CLEANWIN register. This number includes register 
windows that can be restored (the value in the CANRESTORE register) and the 
register windows following CWP that can be used without cleaning. Therefore, the 
number of clean windows available to be used by the SAVE instruction is 


CLEANWIN - CANRESTORE 


The SAVE instruction causes a clean window exception if this value is 0. This 
behavior allows supervisor software to clean a register window before it is accessed 
by a user. 


Vectoring of Fill/Spill Traps 


To make handling of fill and spill traps efficient, the SPARC V9 architecture provides 
multiple trap vectors for the fill and spill traps. These trap vectors are determined as 
follows: 


m Supervisor software can mark a set of contiguous register windows as belonging 
to an address space different from the current one. The count of these register 
windows is kept in the OTHERWIN register. A separate set of trap vectors 
(fill n other and spill n other) is provided for spill and fill traps for these register 
windows (as opposed to register windows that belong to the current address 
space). 

m Supervisor software can specify the trap vectors for fill and spill traps by 
presetting the fields in the WSTATE register. This register contains two subfields, 
each three bits wide. The WSTATE.normal field determines one of eight spill (fill) 
vectors to be used when the register window to be spilled (filled) belongs to the 
current address space (OTHERWIN = 0). If the OTHERWIN register is nonzero, the 
WSTATE.other field selects one of eight fill n other (spill n other) trap vectors. 
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See Trap-Table Entry Addresses on page 432, for more details on how the trap address 
is determined. 


12.8.4 CWP on Window Traps 


On a window trap, the CWP is set to point to the window that must be accessed by 
the trap handler, as follows. 


Note | All arithmetic on CWP is done modulo N_REG_WINDOWS. 


m Ifthe spill trap occurs because of a SAVE instruction (when CANSAVE = 0), there 
is an overlap window between the CWP and the next register window to be 
spilled: 

CWP < (CWP + 2) mod N REG WINDOWS 
If the spill trap occurs because of a FLUSHW instruction, there can be unused 


windows (CANSAVE) in addition to the overlap window between the CWP and 
the window to be spilled: 


CWP < (CWP + CANSAVE + 2) mod N REG WINDOWS 
Implementation | All spill traps can set CWP by using the calculation: 
Note | CWP < (CWP + CANSAVE + 2) mod N REG WINDOWS 
since CANSAVE is 0 whenever a trap occurs because of a SAVE 
instruction. 
m Ona fill trap, the window preceding CWP must be filled: 
CWP < (CWP - 1) mod N_REG_WINDOWS 
m Ona clean window trap, the window following CWP must be cleaned. Then 
CWP < (CWP + 1) mod N REG WINDOWS 


12.85 | Window Trap Handlers 


The trap handlers for fill, spill, and clean window traps must handle the trap 
appropriately and return, by using the RETRY instruction, to reexecute the trapped 
instruction. The state of the register windows must be updated by the trap handler, 
and the relationships among CLEANWIN, CANSAVE, CANRESTORE, and 
OTHERWIN must remain consistent. Follow these recommendations: 


m A spill trap handler should execute the SAVED instruction for each window that 
it spills. 

m A fill trap handler should execute the RESTORED instruction for each window 
that it fills. 


a A clean window trap handler should increment CLEANWIN for each window that 
it cleans: 


CLEANWIN < (CLEANWIN + 1) 
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CHAPTER 1 3 


Interrupt Handling 





Virtual processors and I/O devices can interrupt a selected virtual processor by 
assembling and sending an interrupt packet. The contents of the interrupt packet are 
defined by software convention. Thus, hardware interrupts and cross-calls can have 
the same hardware mechanism for interrupt delivery and share a common software 
interface for processing. 


The interrupt mechanism is a two-step process: 


m sending of an interrupt request (through an implemenation-specific hardware 
mechanism) to an interrupt queue of the target virtual processor 


m receipt of the interrupt request on the target virtual processor and scheduling 
software handling of the interrupt request 


Privileged software running on a virtual processor can schedule interrupts to itself 
(typically, to process queued interrupts at a later time) by setting bits in the 
privileged SOFTINT register (see Software Interrupt Register (SOFTINT) on page 456). 


Programming | An interrupt request packet is sent by an interrupt source and is 

Note | received by the specified target in an interrupt queue. Upon 
receipt of an interrupt request packet, a special trap is invoked 
on the target virtual processor. The trap handler software 
invoked in the target virtual processor then schedules itself to 
later handle the interrupt request by posting an interrupt in the 
SOFTINT register at the desired interrupt level. 





In the following sections, the following aspects of interrupt handling are described: 
m Interrupt Packets on page 456. 
m Software Interrupt Register (SOFTINT) on page 456. 


Interrupt Queues on page 457. 
m Interrupt Traps on page 459. 
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13.1 Interrupt Packets 


Each interrupt is accompanied by data, referred to as an “interrupt packet”. An 
interrupt packet is 64 bytes long, consisting of eight 64-bit doublewords. The 
contents of these data are defined by software convention. 





13.2 Software Interrupt Register (SOFTINT) 


To schedule interrupt vectors for processing at a later time, privileged software 
running on a virtual processor can send itself signals (interrupts) by setting bits in 
the privileged SOFTINT register. 


See SOFTINT" Register (ASRs 20, 21, 22) on page 77 for a detailed description of the 


SOFTINT register. 


Programming 
Note 


Programming 
Note 





The SOFTINT register (ASR 1646) is used for communication 
from nucleus (privileged, TL > 0) software to privileged software 
running with TL = 0. Interrupt packets and other service 
requests can be scheduled in queues or mailboxes in memory by 
the nucleus, which then sets SOFTINT{n} to cause an interrupt 
at level n. 


The SOFTINT mechanism is independent of the “mondo” 
interrupt mechanism mentioned in Interrupt Queues on page 457. 
The two mechanisms do not interact. 


19:21 Setting the Software Interrupt Register 


SOFTINT [r1] is set to 1 by executing a WRSOFTINT. SET? instruction (WRasr using 
ASR 20) with a ‘1’ in bit n of the value written (bit n corresponds to interrupt level 
n). The value written to the SOFTINT SET register is effectively ored into the 
SOFTINT register. This approach allows the interrupt handler to set one or more 
bits in the SOFTINT register with a single instruction. 


See SOFTINT. SET? Pseudo-Register (ASR 20) on page 78 for a detailed description of 
the SOFTINT SET pseudo-register. 
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13.22 


Clearing the Software Interrupt Register 


When all interrupts scheduled for service at level n have been serviced, kernel 
software executes a WRSOFTINT_CLR? instruction (WRasr using ASR 21) with a '1' 
in bit n of the value written, to clear interrupt level n (impl. dep. 34-V8a). The 
complement of the value written to the SOFTINT CLR register is effectively anded 
with the SOFTINT register. This approach allows the interrupt handler to clear one 
or more bits in the SOFTINT register with a single instruction. 


Programming | To avoid a race condition between operating system kernel 
Note | software clearing an interrupt bit and nucleus software setting 
it, software should (again) examine the queue for any valid 
entries after clearing the interrupt bit. 


See SOFTINT_CLR? Pseudo-Register (ASR 21) on page 79 for a detailed description of 
the SOFTINT CLR pseudo-register. 





13.3 


13.3.1 


Interrupt Queues 


Interrupts are indicated to privileged mode via circular interrupt queues, each with 
an associated trap vector. There are 4 interrupt queues, one for each of the following 
types of interrupts: 


a Device mondos! 

m CPU mondos 

m Resumable errors 

m Nonresumable errors 


New interrupt entries are appended to the tail of a queue and privileged software 
reads them from the head of the queue. 


Programming | Software conventions for cooperative management of interrupt 


Note | queues and the format of queue entries are specified in the 
separate Hypervisor API Specification document. 


Interrupt Queue Registers 


The active contents of each queue are delineated by a 64-bit head register and a 64- 
bit tail register. 


1- “mondo” is a historical term, referring to the name of the original UltraSPARC 1 bus transaction in which 


these interrupts were introduced 
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The interrupt queue registers are accessed through ASI ASI_QUEUE (2546). The ASI 
and address assignments for the interrupt queue registers are provided in TABLE 13-1. 


TABLE 13-1 Interrupt Queue Register ASI Assignments 





| sat mode 
Register Access 
CPU Mondo Queue Head 2516 (ASI QUEUE)  3C046 RW 
CPU Mondo Queue Tail 2536 (AST_QUEUE)  3C8% Ror RW+ 
Device Mondo Queue Head 2516 (ASI_LQUEUE) 3D046 RW 
Device Mondo Queue Tail 2516 (ASI_ QUEUE)  3D8,6; Ror RW+ 
Resumable Error Queue Head 2516 (ASI. QUEUE) 3E016 RW 
Resumable Error Queue Tail 2516 (ASI. QUEUE) 3E8416 Ror RWt 
Nonresumable Error Queue Head 2546 (ASI QUEUE) 3F016 RW 
Nonresumable Error Queue Tail 2516 (ASI. QUEUE) 3F84, Ror RWt 























+ see IMPL. DEP.#422-S10 


The status of each queue is reflected by its head and tail registers: 


m A Queue Head Register indicates the location of the oldest interrupt packet in the 
queue 


m A Queue Tail Register indicates the location where the next interrupt packet will 
be stored 


An event that results in the insertion of a queue entry causes the tail register for that 
queue to refer to the following entry in the circular queue. Privileged code is 
responsible for updating the head register appropriately when it removes an entry 
from the queue. 


A queue is empty when the contents of its head and tail registers are equal. A queue 
is full when the insertion of one more entry would cause the contents of its head and 
tail registers to become equal. 
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Programming | By current convention, the format of a Queue Head or Tail 
Note | register is as follows: 


head/tail offset 000000 


63 6 5 0 
Under this convention: 


m updating a Queue Head register involves incrementing it by 
64 (size of a queue entry, in bytes) 

m Queue Head and Tail registers are updated using modular 
arithmetic (modulo the size of the circular queue, in bytes) 

um bits 5:0 always read as zeros, and attempts to write to them are 
ignored 

m the maximum queue offset for an interrupt queue is 
implementation dependent 


m behavior when a queue register is written with a value larger 
than the maximum queue offset (queue length minus the 
length of the last entry) is undefined 


This is merely a convention and is subject to change. 








13.4 


Interrupt Traps 


The following interrupt traps are defined in the UltraSPARC Architecture 2005: 
cpu_mondo, dev_mondo, resumable_error, and nonresumable_error. See 
Chapter 12, Traps, for details. 


UltraSPARC Architecture 2005 also supports the interrupt_level_n traps defined in 
the SPARC V9 specification. 


How interrupts are delivered is implementation-specific; see the relevant 
implementation-specific Supplement to this specification for details. 
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CHAPTER 1 4 


Memory Management 





An UltraSPARC Architecture Memory Management Unit (MMU) conforms to the 
requirements set forth in the SPARC V9 Architecture Manual. In particular, it supports 
a 64-bit virtual address space, simplified protection encoding, and multiple page 
sizes. 


In UltraSPARC Architecture 2005, memory management is implementation-specific. 
Basic concepts are described in this chapter, but see the relevant processor-specific 
Supplement to this specification for a detailed description of a particular processor’s 
memory management facilities. 


This appendix describes the Memory Management Unit, as observed by privileged 
software, in these sections: 


m Virtual Address Translation on page 461. 
m TSB Translation Table Entry (TTE) on page 462. 
m Translation Storage Buffer (TSB) on page 466. 





14.1 


Virtual Address Translation 


The MMUs may support up to four page sizes: 8 KBytes, 64 KBytes, 4 MBytes, and 
256 MBytes 8-KByte, 64-KByte and 4- MByte page sizes must be supported; other 
page sizes are optional. 


Privileged software manages virtual-to-real address translations. 


Privileged software maintains translation information in an arbitrary data structure, 
called the software translation table. 


The Translation Storage Buffer (TSB) is an array of Translation Table Entries which 
serves as a cache of the software translation table, used to quickly reload the TLB in 
the event of a TLB miss. 
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A conceptual view of privileged-mode memory management the MMU is shown in 
FIGURE 14-1. The software translation table is likely to be large and complex. The 
translation storage buffer (TSB), which acts like a direct-mapped cache, is the 
interface between the software translation table and the underlying memory 
management hardware. The TSB can be shared by all processes running on a virtual 
processor or can be process specific; the hardware does not require any particular 


scheme. There can be several TSBs. 


RA< VA 


Software 
Translation 
Table 


Translation 


Storage 
Buffer 
(TSB) 





Memory Operating System 
Data Structure 


«———- Managed by privileged ———9 
mode software 


FIGURE 14-1 Conceptual View of the MMU 





14.2 TSB Translation Table Entry (TTE) 


The Translation Storage Buffer (TSB) Translation Table Entry (TTE) is the equivalent 
of a page table entry as defined in the Sun4v Architecture Specification; it holds 
information for a single page mapping. The TTE is divided into two 64-bit words 
representing the fag and data of the translation. Just as in a hardware cache, the tag 
is used to determine whether there is a hit in the TSB; if there is a hit, the data are 


used by privileged software. 


The TTE configuration is illustrated in FIGURE 14-2 and described in TABLE 14-1. 


Tad | ometi | m | we OSOS 
context_id 
Tag xt i 000000 va 


























63 48 47 42 41 0 
TTE v | nfo soft2 taddr ie |e |cp | cv| p ep | w | soft SZ | 
Data 

63 62 61 56 55 13 12 11 10 9 8 7 6 5 3 0 


FIGURE 14-2 Translation Storage Buffer (TSB) Translation Table Entry (TTE) 
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TABLE 14-1 TSB TTE Bit Description (1 of 4) 
Bit Field Description 
Tag- 63:48 — context id The 16-bit context ID associated with the TTE. 
Tag- 47:442 | — These bits must be zero for a tag match. 
Tag- 41:0 va Bits 63:22 of the Virtual Address (the virtual page number). Bits 21:13 of the VA 
are not maintained because these bits index the minimally sized, direct-mapped 
TSBs. 
Data — 63 v Valid. If v = 1, then the remaining fields of the TTE are meaningful, and the TTE 
can be used; otherwise, the TTE cannot be used to translate a virtual address. 
Programming | The explicit Valid bit is (intentionally) redundant with the 
Note | software convention of encoding an invalid TTE with an 
unused context ID. The encoding of the context_id field is 
necessary to cause a failure in the TTE tag comparison, 
while the explicit Valid bit in the TTE data simplifies the 
TTE miss handler. 
Data — 62 nfo No Fault Only. If nfo = 1, loads with ASI PRIMARY NO FAULT( LITTLE) or 


Data — 61:56  soft2 


Data — 55:13 


taddr 


ASI_SECONDARY_NO_FAULT{_LITTLE} are translated. Any other data access 
with the D/UMMU TTE.nfo = 1 will trap with a data_access_exception. An 
instruction fetch access to a page with the IMMU TTE.nfo = 1 results in an 
instruction_access_exception exception. 





Software-defined field, provided for use by the operating system. The soft2 field 
can be written with any value in the TSB. Hardware is not required to maintain 
this field in any TLB (or uTLB), so when it is read from the TLB (uTLB), it may 
read as zero. 


Target address; the underlying address (Real Address [55:13]) to which the MMU 
will map the page. 

IMPL. DEP. #238-U3: When page offset bits for larger page sizes are stored in 
the TLB, it is implementation dependent whether the data returned from those 
fields by a Data Access read is zero or the data previously written to them. 
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TABLE 14-1 TSB TTE Bit Description (2 of 4) 
Bit Field Description 
Data - 12 ie Invert Endianness. If ie = 1 for a page, accesses to the page are processed with 
inverse endianness from that specified by the instruction (big for little, little for 
big). 
Programming | (1) The primary purpose of this bit is to aid in the mapping 
Notes | of I/O devices (through noncacheable memory addresses) 
whose registers contain and expect data in little-endian 
format. Setting TTE.ie = 1 allows those registers to be 
accessed correctly by big-endian programs using ordinary 
loads and stores, such as those typically issued by 
compilers; otherwise little-endian loads and stores would 
have be issued by hand-written assembler code. 
(2) This bit can also be used when mapping cacheable 
memory. However, cacheable accesses to pages marked 
with TTE.ie = 1 may be slower than accesses to the page 
with TTE.ie 2 0. For example, an access to a cacheable 
page with TTE.ie = 1 may perform as if there was a miss in 
the first-level data cache. 
Implementation | Some implementations may require cacheable accesses to 
Note | pages tagged with TTE.ie = 1 to bypass the data cache, 
adding latency to those accesses. 
IMPL. DEP. & : The ie bit in the IMMU is ignored during ITLB operation. It is 
implementation dependent if it is implemented and how it is read and written. 
Data - 11 e Side effect. If the side-effect bit is set to 1, loads with ASI PRIMARY, NO FAULT, 


ASI SECONDARY, NO FAULT, and their * LITTLE variations will trap for 
addresses within the page, noncacheable memory accesses other than block 
loads and stores are strongly ordered against other e-bit accesses, and 
noncacheable stores are not merged. This bit should be set to 1 for pages that 
map I/O devices having side effects. Note, also, that the e bit causes the prefetch 
instruction to be treated as a nop, but does not prevent normal (hardware) 
instruction prefetching. 

Note 1: The e bit does not force a noncacheable access. It is expected, but not 
required, that the cp and cv bits will be set to 0 when the e bit is set to 1. If both 
the cp and cv bits are set to 1 along with the e bit, the result is undefined. 

Note 2: The e bit and the nfo bit are mutually exclusive; both bits should never 
be set to 1 in any TTE. 
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TABLE 14-1 


Bit 


TSB TTE Bit Description (3 of 4) 


Field 


Description 





Data - 10 
Data - 9 


Data - 7 


Data- 6 


Data — 5:4 


Cp, 
Cv 


soft 


The cacheable-in-physically-indexed-cache bit and cacheable-in-virtually- 
indexed-cache bit determine the cacheability of the page. Given an 
implementation with a physically indexed instruction cache, a virtually indexed 
data cache, and a physically indexed unified second-level cache, the following 
table illustrates how the cp and cv bits could be used: 


Cacheable Meaning of TTE when placed in: 

(cp:cv) I-TLB (Instruction Cache PA-indexed) D-TLB (Data Cache VA-indexed) 
00,01  Noncacheable  Nonacheble > 
10 Cacheable L2-cache, I-cache Cacheable L2-cache 

11 Cacheable L2-cache, I-cache Cacheable L2-cache, D-cache 





The MMU does not operate on the cacheable bits but merely passes them 
through to the cache subsystem. The cv bit in the IMMU is read as zero and 
ignored when written. 

IMPL. DEP. #226-U3: Whether the cv bit is supported in hardware is 
implementation dependent in the UltraSPARC Architecture. The cv bit in 
hardware should be provided if the implementation has virtually indexed 
caches, and the implementation should support hardware unaliasing for the 
caches. 


Privileged. If p = 1, only privileged software can access the page mapped by the 
TTE. If p = 1 and an access to the page is attempted by nonprivileged mode 
(PSTATE.priv = 0), then the MMU signals aninstruction access exception 
exception ordala access exception exception. 


Executable. If ep = 1, the page mapped by this TTE has execute permission 
granted. Instructions may be fetched and executed from this page. If ep = 0, an 
attempt to execute an instruction from this page results in an 
instruction access exception exception. 

IMPL. DEP. # 


IMPL. DEP. #Writable. If w = 1, the page mapped by this TTE has write 
permission granted. Otherwise, write permission is not granted 


Software-defined field, provided for use by the operating system. The soft field 
can be written with any value in the TSB. Hardware is not required to maintain 
this field in any TLB (or uTLB), so when it is read from the TLB (or uTLB), it may 
read as zero. 
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TABLE 14-1 TSB TTE Bit Description (4 of 4) 








Bit Field Description 

Data — 3:0 SZ The page size of this entry, encoded as shown below. 
sz Page Size 
0000 8 Kbyte 
0001 64 Kbyte 
0010 Reserved 
0011 4 Mbyte 
0100 Reserved 
0101 256 Mbyte 
0110 Reserved 
0111 Reserved 


1000-1111 Reserved 








14.3 Translation Storage Buffer (TSB) 


The Translation Storage Buffer (TSB) is an array of Translation Table Entries 
managed entirely by privileged software. It serves as a cache of the software 
translation table, used to quickly reload the TLB in the event of a TLB miss. 


14.3.1 TSB Indexing Support 


Hardware TSB indexing support via TSB pointers should be provided for the TTEs. 


14.3.2 TSB Cacheability and Consistency 


The TSB exists as a data structure in memory and therefore can be cached. Indeed, 
the speed of the TLB miss handler relies on the TSB accesses hitting the level-2 cache 
at a substantial rate. This policy may result in some conflicts with normal 
instruction and data accesses, but the dynamic sharing of the level-2 cache resource 
will provide a better overall solution than that provided by a fixed partitioning. 


Programming 
Note 





When software updates the TSB, it is responsible for ensuring 
that the store(s) used to perform the update are made visible in 
the memory system (for access by subsequent loads, stores, and 
load-stores) by use of an appropriate MEMBAR instruction. 


Making a TSB update visible to fetches of instructions 
subsequent to the store(s) that updated the TSB may require 
execution of instructions such as FLUSH, DONE, or RETRY, in 
addition to the MEMBAR. 
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14.3.3 


14.3.4 


TSB Organization 


The TSB is arranged as a direct-mapped cache of TTEs. 


In each case, n least significant bits of the respective virtual page number are used as 
the offset from the TSB base address, with n equal to log base 2 of the number of 
TTEs in the TSB. 


The TSB organization is illustrated in FIGURE 14-3. The constant n can range from 512 
to an implementation-dependent number. 





Tag#1 (8 bytes) A Data#1 (8 bytes) 











2" Lines in TSB 











Data#2” (8 bytes) 





Tag#2” (8 bytes) 


FIGURE 14-3 TSB Organization 


Accessing MMU Registers 


All internal MMU registers can be accessed directly by the virtual processor through 
defined ASIs, using LDXA and STXA instructions. UltraSPARC Architecture- 
compatible processors do not require a MEMBAR #Sync, FLUSH, DONE, or RETRY 
instruction after a store to an MMU register for proper operation. 


TABLE 14-2 lists the MMU registers and provides references to sections with more 
details. 


TABLE 14-2 MMU Internal Registers and ASI Operations 


























IMMU ASI rege VA{63:0} Access Register or Operation Name 
2146 816 RW Primary Context ID register 
— 2116 1016 RW Secondary Context ID register 
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APPENDIX A 





Opcode Maps 


This appendix contains the UltraSPARC Architecture 2005 instruction opcode maps. 


In this appendix and in Chapter 7, Instructions, certain opcodes are marked with 
mnemonic superscripts. These superscripts and their meanings are defined in 
TABLE 7-1 on page 124. For preferred substitute instructions for deprecated opcodes, 
see the individual opcodes in Chapter 7 that are labeled “Deprecated”. 


In the tables in this appendix, reserved (—) and shaded entries (as defined below) 
indicate opcodes that are not implemented in UltraSPARC Architecture 2005 strands. 


| [An attempt to execute opcode will cause an illegal instruction exception. 


An attempt to execute opcode will cause an fp. exception other exception with 
FSR.ftt = 3 (unimplemented FPop). 








An attempt to execute a reserved opcode behaves as defined in Reserved Opcodes and 
Instruction Fields on page 120. 


TABLE A-1 Opí1:0] 











op (1:0) 
0 1 2 3 
Branches and SETHI CALL Arithmetic & Miscellaneous Loads/Stores 
(See TABLE A-2) (See TABLE A-3) (See TABLE A-4) 





TABLE A-2  Op2{2:0} (op = 0) 

















op2 (2:0) 
0 1 2 3 4 5 6 7 
ILLTRAP  |BPcc (See Bicc™ (See BPr (bit 28 = 0) SETHI  |FBPfcc (See FBfccP (See — 
TABLE A-7) |TABLE A-7) (See TABLE A-8) TABLE A-7) TABLE A-7) 
— (bit28-1) ^ |NOP? 











1. See the footnote regarding bit 28 on page 148. 
2. rd = 0, imm22 - 0 
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TABLE A-3 0p3(5:0] (op = 105) (1 of 2) 











op3{5:4} 
0 1 2 3 
0 |ADD ADDcc TADDcc WRYP (rd = 0) 
— (rd -1) 
WRCCR (rd = 2 
WRASI (rd = 3) 
— (rd - 4,5) 





(rd = 15, rs1 20, i - 1) 
— (rd = 15) and (rs1 #0 or i #1)) 
— (rd 27-14) 
WRFPRS (rd = 6) 
WRasrP^5R (7< rg < 14) 
WRPCRP (rd = 16) 
WRPIC (rd = 17) 
— (rd = 18) 
WRGSR (rd = 19) 
WRSOFTINT SET? (rd = 20) 
WRSOFTINT CLRP (rd = 21) 
WRSOFTINTP (rd = 22) 
WRTICK CMPRP (rd = 23) 
WRSTICK_CMPR? (rd = 25) 
— (rd = 26 - 31) 


1 [AND ANDcc TSUBcc SAVED? (fcn = 0) 
RESTORED? (fcn = 1) 
Bro) ALLCLEANP (fcn = 2) 
OTHERW? (fcn = 3) 
NORMALWP (fcn = 4) 
INVALWP (fcn = 5) 
— (fcn 2 6) 














2 |o  |oR« [appe —— | 

2 [OR ORcc TADDccTVP WRPR” (rd = 0-14 or 16) 

3 

4 |SUB SUBcc IMULSc? ^  [FPopi (See TABLE A-5) 

5 FPop2 (See TABLE A-6) 

6 |ORN ORNcc SRL (x - 0), SRLX (x - 1) IMPDEPI (VIS) (See TABLE A-12) 
7 |XNOR XNORcc SRA (x = 0), SRAX (x = 1) IMPDEP2 
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TABLE A-3 0p3([5:0] (op = 105) (2 of 2) 











op3{5:4} 
0 1 2 3 
8 |ADDC ADDCcc RDYP (rst 20, i- 0) JMPL 
— (rsi = 1,i=0) 


RDCCR (rs1= 2, i=0) 
RDASI (rs1 = 3,i=0) 
RDTICK"?t (rs1 = 4, i =0) 
RDPC (rs1 = 5,i=0) 
RDFPRS (rs1 = 6, i =0) 
RDasrPAs® (7 < rd < 14, i = 0) 
MEMBAR (rs1 = 15, rd = 0, i=1, 
instruction bit 12 = 0) 
— (rsi =15, rd = 0, i=1, 
instruction bit 12 = 1) 

— (i = 1, (rs1 #15 or rd #0)) 

(rs = 15, rd = 0, i= 0) 
— (rs1 = 15 and rd > 0 and i = 0) 
RDPCR? (rs1 = 16 and i = 0) 
RDPIC (rs1 = 17 and i = 0) 
— (rs1 =18 and i = 0) 
RDGSR (rs1 = 19 and i = 0) 
— (rs1 = 20 or 21) and (i = 0)) 
RDSOFTINT? (rs1 = 22 and i = 0) 
op3 RDTICK_CMPR? (rs1 = 23 and i = 0) 
































{3:0} RDSTICK (rs! = 24 and i = 0) 
RDSTICK_CMPR? 
(rs1 = 25 and i= 0) 
— ((rs1 = 26 — 31) and (i = 0) 
9 RETURN 
A (rs1 = 1-14 or 16) Tee ((i = 0 and inst{10:5} = 0) or 
(i = 1) and (inst{10:8} = 0))) 
(See TABLE A-7) 
— (rst = 15 or 17 - 31) — (bit 29 = 1) 
— ((i=0 and (inst{10:5} + 0)) or 
(i =1 and (inst{10:8} + 0)) 
B SMULcc? FLUSHW FLUSH 
C SUBCcc MOVcc SAVE 
D = SDIVX RESTORE 
E UDIVccP POPC (rs1 = 0) DONE! (fcn = 0) 
— (rs1 > 0) RETRY? (fcn = 1) 
— (fen = 2..15) 
— (fen = 16..31) 
F MOVr (See TABLE A-8) = 
op3 
{3:0} 
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TABLE A-4 


0p3{5:0} (op = 115) 


op3{5:4} 
eve lt me vee se =: 3 
mowa [LDF LDFAT™S! 
LDUBAPASI (rd = 0) LDFSR? E 
(rd = 1) LDXFSR 


LDUHAPASI LDQFAPASI 





LDTWAD, PASI 
LDTXA 
— (rd odd) 


LDDFAPASI 
LDBLOCKF 
LDSHORTF 





STWAPASI STFAPASI 





STFSRD, STXESR = 





























STOFAT™ 
STTWP ope STDF STDFAPASI 
— (rd odd) STLBLOCKF 
STPARTIALF 
STSHORTF 
Reser 
LDSB Reserved Reserved 
LDSH Reserved Reserved 
LDX Reserved Reserved 
Reserved Reserved — CASAPASI 
LDSTUB LDSTUBAPAST PREFETCH PREFETCHAPASI 
— (fcn = 5 — 15) — (fcn = 5 — 15) 
En SRR cas 
RETE 
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TABLE A-5 


opf{8:0} (op = 105,0p3 = 3416 = FPop1) 


opf{3:0} 





opf{8:4} 
0016 


FMOVs 


FMOVd 


3 
FMOVq 


4 





0146 





0246 
0316 





0516 








0616 





0716 
0816 








0916 
0A46 





OB1g 
OD56 














OE46-1F4g 


0016 





FABSs 


FABSd 





0216 
0316 





FSORTs 


FSORTd 








0446 
0516 


FMULs 


FMULd 





FDIVd 








0616 
0716 


FsMULd 


FdMULq 





0816 
0916 
0A46 





0B16 
0C16 











0D16 
0E16-1F16 
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TABLE A-6 pf {8:0} (op = 105, op3 = 3516 = FPop2) 


opf{3:0} 





opf(8:4) | 0 1 


FMOVgq (fcc0) 


4 


5 6 7 


ti 











FMOVRsZt |FMOVRdZł |FMOVRgZ t 





FMOVAa (fcc1) 





FCMPq 








FMOVq (fcc2) 














FMOVgq (fcc3) 











FMOVgq (icc) 





FMOVq (xcc) 








* Reserved variation of FMOVR 








f bit 13 of instruction = 0 
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TABLE A-7 cond{3:0} 
FBPfcc Tec 
op =0 op =2 
op2=5 Op3 = 3a{6 
0 | BPN BN" FBPN FBN” TN 
3 | BPL BLY FBPUL FBUL” TL 
4 | BPLEU BLEU? FBPL FBL? TLEU 
6 | BPNEG BNEGP FBPG FBG” TNEG 
cond | 7 | BPVS BvsP FBPU FBUP TVS 
9 | BPNE BNE” FBPE FBEP TNE 
A | BPG BGP FBPUE FBUEP TG 
C | BPGU BGU" FBPUGE FBUGEP TGU 
D | BPCC BCCP FBPLE FBLED TCC 
F | BPVC BVC? FBPO FBO” TVC 
TABLE A-8 Encoding of rcond{2:0} Instruction Field 
| Br | MOV |  FMOW | 
op =0 op =2 op =2 
op2 = 3 op3 = 2F{6 Op3 = 3546 
0 =? Es = 
| 1 BRZ |MOVRZ  [FMOVResidiq>Z | 
| 2 [BRLEZ  |[MOVRLEZ |FMOVResldlq»LEZ | 
rcond 3 |BRLZ MOVRLZ FMOVR-sdlq»LZ 
{20} | 4 CR ns es 
| 5 |BRNZ  |MOVRNZ [FMOVResidiIq>NZ | 
6 |BRGZ MOVRGZ FMOVR«s|dlq»GZ 
| 7 |BRGEZ |MOVRGEZ |FMOVResldlq»GEZ | 
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TABLE A9 cc / opf cc Fields (MOVcc and FMOVcc) 


opf cc Condition Code 
cc2 | cci | ccO Selected 
































TABLE A-10 cc Fields (FBPfcc, FCMP, and FCMPE) 








cci | cco Condition Code 
Selected 
0 0 fccO 
0 1 fcc1 
1 0 fcc2 
1 1 fcc3 














TABLE A-11 cc Fields (BPcc and Tec) 








ce | ecô Condition Code 
Selected 
0 0 icc 
0 1 — 
1 0 XCC 
1 1 — 
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TABLE A-12 IMPDEP1: opf{8:0} for VIS opcodes (op = 105, op3 = 3616) 








opf {8:4} 





01 03 
ARRAY8 


ARRAY8 


05 


06 


FZERO 


FZERO 


08 





FZEROS 





ARRAY16 


FNOR 
FNORS 








FXNOR 
FXNORS 





ARRAY32 FCMPLE32 


FMUL 
8x16AL 


FPSUB32 


FANDNOT2 


FSRC1 


FORNOT2 





FPSUB32S 


FORNOT2S 





ALIGN 
ADDRESS 








ALIGN 


6 
7 
8 
FMULD 
8ULx16 
FCMPEQ16 |FPACK32 
ADDRESS 
LITTLE 


FPMERGE 


FANDNOT1 


FN 





FSRC2 


FANDNOTIS|FSRC2S 








B |EDGE32LN — — FPACK16 
C — = FCMPGT32 — 


BSHUFFLE 
FEXPAND 


FORNOTIS 
FOR 











FPACKFIX 
FCMPEQ22 |PDIST 
F A. m 








FNANDS 





FORS 
FONE 


FONES 
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TABLE A-14 IMPDEP1: opf{8:0} for VIS opcodes (op = 105, op3 = 3616) (3 of 3) 


opf {8:4} 
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f back later for a copy of UltraSPARC Architecture / 
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y 2005 containing the final version of this chapter. 
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Implementation Dependencies 





This appendix summarizes implementation dependencies in the SPARC V9 
standard. In SPARC V9, the notation “IMPL. DEP. £nn:" identifies the definition of 
an implementation dependency; the notation "(impl. dep. fr)" identifies a reference 
to an implementation dependency. These dependencies are described by their 
number nn in TABLE B-1 on page 481. 


The appendix contains these sections: 


Definition of an Implementation Dependency on page 479. 
Hardware Characteristics on page 480. 
Implementation Dependency Categories on page 480. 


L| 
L| 
L| 
m List of Implementation Dependencies on page 481. 





B.1 Definition of an Implementation 
Dependency 


The SPARC V9 architecture is a model that specifies unambiguously the behavior 
observed by software on SPARC V9 systems. Therefore, it does not necessarily 
describe the operation of the hardware of any actual implementation. 


An implementation is not required to execute every instruction in hardware. An 
attempt to execute a SPARC V9 instruction that is not implemented in hardware 
generates a trap. Whether an instruction is implemented directly by hardware, 
simulated by software, or emulated by firmware is implementation dependent. 


479 


The two levels of SPARC V9 compliance are described in UltraSPARC Architecture 
2005 Compliance with SPARC V9 Architecture on page 23. 


Some elements of the architecture are defined to be implementation dependent. 
These elements include certain registers and operations that may vary from 
implementation to implementation; they are explicitly identified as such in this 
appendix. 


Implementation elements (such as instructions or registers) that appear in an 
implementation but are not defined in this document (or its updates) are not 
considered to be SPARC V9 elements of that implementation. 





B.2 


Hardware Characteristics 


Hardware characteristics that do not affect the behavior observed by software on 
SPARC V9 systems are not considered architectural implementation dependencies. A 
hardware characteristic may be relevant to the user system design (for example, the 
speed of execution of an instruction) or may be transparent to the user (for example, 
the method used for achieving cache consistency). The SPARC International 
document, Implementation Characteristics of Current SPARC V9-based Products, Revision 
9.x, provides a useful list of these hardware characteristics, along with the list of 
implementation-dependent design features of SPARC V9-compliant 
implementations. 


In general, hardware characteristics deal with 
m Instruction execution speed 
m Whether instructions are implemented in hardware 


m The nature and degree of concurrency of the various hardware units constituting 
a SPARC V9 implementation 





B.3 


Implementation Dependency Categories 


Many of the implementation dependencies can be grouped into four categories, 

abbreviated by their first letters throughout this appendix: 

m Value (v) 
The semantics of an architectural feature are well defined, except that a value 
associated with the feature may differ across implementations. A typical example 
is the number of implemented register windows (impl. dep. #2-V8). 
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m Assigned Value (a) 
The semantics of an architectural feature are well defined, except that a value 
associated with the feature may differ across implementations and the actual 
value is assigned by SPARC International. Typical examples are the impl field of 
the Version register (VER) (impl. dep. #13-V8) and the FSR.ver field (impl. dep. 
#19-V8). 

m Functional Choice (f) 
The SPARC V9 architecture allows implementors to choose among several 
possible semantics related to an architectural function. A typical example is the 
treatment of a catastrophic error exception, which may cause either a deferred or 
a disrupting trap (impl. dep. #31-V8-Cs10). 

m Total Unit (t) 
The existence of the architectural unit or function is recognized, but details are 
left to each implementation. Examples include the handling of I/O registers 
(impl. dep. #7-V8) and some alternate address spaces (impl. dep. #29-V8). 





B.4 


TABLE B-1 


List of Implementation Dependencies 


TABLE B-1 provides a complete list of the SPARC V9 implementation dependencies. 
The Page column lists the page for the context in which the dependency is defined; 
bold face indicates the main page on which the implementation dependency is 
described. 


SPARC V9 Implementation Dependencies (1 of 9) 





Nbr Category Description Page 


1-V8 f 


2-8 v 


3-V8 f 


4,5 


Software emulation of instructions 23 
Whether an instruction complies with UItraSPARC Architecture 2005 by being 
implemented directly by hardware, simulated by software, or emulated by firmware is 
implementation dependent. 


Number of IU registers 24, 48 
An UItraSPARC Architecture implementation may contain from 72 to 640 general- 

purpose 64-bit R registers. This corresponds to a grouping of the registers into 

MAXPGL + 1 sets of global R registers plus a circular stack of N REG WINDOWS sets of 16 
registers each, known as register windows. The number of register windows present 

(N REG WINDOWS) is implementation dependent, within the range of 3 to 32 

(inclusive). 


Incorrect IEEE Std 754-1985 results 119 
An implementation may indicate that a floating-point instruction did not produce a 
correct IEEE Std 754-1985 result by generating an fp exception other exception with 
FSR.ftt = unfinished FPop or FSR.ftt = unimplemented FPop. In this case, software 
running in a higher privilege mode shall emulate any functionality not present in the 
hardware. 


Reserved. 
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TABLE B-1 SPARC V9 Implementation Dependencies (2 of 9) 





Nbr Category Description 


6-V8 f 


10-V8-12-V8 
13-V8 a 
14-V8-15-V8 
16-V8-Cu3 
17-V8 


18- f 
V8- 
Ms10 


19-V8 a 


20-V8-21-V8 
22-V8 f 


23-V8 
24-V8 
25-V8 f 


26-V8-28-V8 


I/O registers privileged status 
Whether I/O registers can be accessed by nonprivileged code is implementation 
dependent. 


I/O register definitions 
The contents and addresses of I/O registers are implementation dependent. 


RDasr/WRasr target registers 

Ancillary state registers (ASRs) in the range 0—27 that are not defined in UltraSPARC 
Architecture 2005 are reserved for future architectural use. ASRs in the range 28-31 are 
available to be used for implementation-dependent purposes. 


RDasr/WRasr privileged status 

The privilege level required to execute each of the implementation-dependent read / 
write ancillary state register instructions (for ASRs 28-31) is implementation 
dependent. 


Reserved. 
(this implementation dependency applies to execution modes with greater privileges) 
Reserved. 
Reserved. 
Reserved. 


Nonstandard IEEE 754-1985 results 

UItraSPARC Architecture 2005 implementations do not implement a nonstandard 
floating-point mode. FSR.ns is a reserved bit; it always reads as 0 and writes to it are 
ignored. 


FPU version, FSR.ver 
Bits 19:17 of the FSR, FSR.ver, identify one or more implementations of the FPU 
architecture. 


Reserved. 


FPU tem, cexc, and aexc 
An UItraSPARC Architecture implementation implements the tem, cexc, and aexc 
fields in hardware, conformant to IEEE Std 754-1985. 


Reserved. 
Reserved. 


RDPR of FQ with nonexistent FQ 

An UltraSPARC Architecture implementation does not contain a floating-point queue 
(FQ). Therefore, FSR.ftt = 4 (sequence error) does not occur, and an attempt to read 
the FQ with the RDPR instruction causes an illegal instruction exception. 


Reserved. 


Page 
27 


27 


29, 67, 


287, 359 


29, 67, 
287, 359 


60, 368 


60 


67 


63, 291 
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TABLE B-1 SPARC V9 Implementation Dependencies (3 of 9) 





Nbr Category Description Page 


29-V8 t Address space identifier (ASI) definitions 109 


In SPARC V9, many ASIs were defined to be implementation dependent. Some of 
those ASIs have been allocated for standard uses in the UltraSPARC Architecture. 
Others remain implementation dependent in the UltraSPARC Architecture. See ASI 
Assignments on page 400 and Block Load and Store ASIs on page 415 for details. 


30- f ASI address decoding 109 
V8- In SPARC V9, an implementation could choose to decode only a subset of the 8-bit ASI 
Cu3 specifier. In UltraSPARC Architecture implementations, all 8 bits of each ASI specifier 


must be decoded. Refer to Chapter 10, Address Space Identifiers (ASIs), of this 
specification for details. 


31- f This implementation dependency is no longer used in the UltraSPARC Architecture, 

V8- since “catastrophic” errors are now handled using normal error-reporting 

Cs10 mechanisms. 

32- t Restartable deferred traps 428 
V8- Whether any restartable deferred traps (and associated deferred-trap queues) are 

Ms10 present is implementation dependent. 

33- f Trap precision 431 
V8- In an UltraSPARC Architecture implementation, all exceptions that occur as the result 
Cs10 of program execution are precise. 

34-V8 f Interrupt clearing 


a: The method by which an interrupt is removed is now defined in the UltraSPARC 457 
Architecture (see Clearing the Software Interrupt Register on page 457). 

b: How quickly a virtual processor responds to an interrupt request, like all timing- 
related issues, is implementation dependent. 


35- t Implementation-dependent traps 434 
V8- Trap type (TT) values 06016-07F16 were reserved for 
Cs20 implementation_dependent_exception_n exceptions in SPARC V9 but are now all 
defined as standard UltraSPARC Architecture exceptions. 
36-V8 f Trap priorities 442 


The relative priorities of traps defined in the UltraSPARC Architecture are fixed. 
However, the absolute priorities of those traps are implementation dependent (because 
a future version of the architecture may define new traps). The priorities (both 
absolute and relative) of any new traps are implementation dependent. 


41-V8 Reserved. 
42- t, f, v FLUSH instruction 


V8- FLUSH is implemented in hardware in all UltraSPARC Architecture 2005 
Cs10 implementations, so never causes a trap as an unimplemented instruction. 
43-V8 Reserved. 
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TABLE B-1 SPARC V9 Implementation Dependencies (4 of 9) 





Nbr Category Description Page 
44- f Data access FPU trap 
V8- a: Ifa load floating-point instruction generates an exception that causes a non-precise 237, 259 
Cs10 trap, it is implementation dependent whether the contents of the destination 
floating-point register(s) or floating-point state register are undefined or are 241 
guaranteed to remain unchanged. 

b: If a load floating-point alternate instruction generates an exception that causes a 
non-precise trap, it is implementation dependent whether the contents of the 
destination floating-point register(s) are undefined or are guaranteed to remain 
unchanged. 

45-V8-46-V8 Reserved. 

47- t RDasr 288 
V8- RDasr instructions with rd in the range 28-31 are available for implementation- 

Cs20 dependent uses (impl. dep. #8-V8-Cs20). For an RDasr instruction with rs1 in the 

range 28-31, the following are implementation dependent: 

* the interpretation of bits 13:0 and 29:25 in the instruction 

* whether the instruction is nonprivileged or privileged (impl. dep. #9-V8-Cs20) 

* whether an attempt to execute the instruction causes an illegal instruction exception 

48- t WRasr 359 
V8- WRasr instructions with rd in the range 26-31 are available for implementation- 
Cs20 dependent uses (impl. dep. #8-V8-Cs20). For a WRasr instruction with rd in the range 

26-31, the following are implementation dependent: 

* the interpretation of bits 18:0 in the instruction 

* the operation(s) performed (for example, xor) to generate the value written to the 

ASR 
* whether the instruction is nonprivileged or privileged (impl. dep. #9-V8-Cs20) 
* whether an attempt to execute the instruction causes an illegal instruction exception 
49-V8-54-V8 Reserved. 
55- f Tininess detection 66 
V8- In SPARC V9, it is implementation-dependent whether “tininess” (an IEEE 754 term) is 
Cs10 detected before or after rounding. In all UltraSPARC Architecture implementations, 
tininess is detected before rounding. 
56-100 Reserved. 
101- v Maximum trap level (MAXPTL) 94, 96 
V9- The architectural parameter MAXPTL is a constant for each implementation; its legal 
CS10 values are from 2 to 6 (supporting from 2 to 6 levels of saved trap state). In a typical 
implementation MAXPTL = MAXPGL (see impl. dep. #401-S10). 

Architecturally, MAXPTL must be > 2. 

102- f Clean windows trap 445 
v9 An implementation may choose either to implement automatic “cleaning” of register 


windows in hardware or to generate a clean_window trap, when needed, for 
window(s) to be cleaned by software. 
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TABLE B-1 SPARC V9 Implementation Dependencies (5 of 9) 





Nbr Category Description Page 
103- f Prefetch instructions 

V9- The following aspects of the PREFETCH and PREFETCHA instructions are 

Ms10 implementation dependent: 


a: the attributes of the block of memory prefetched: its size (minimum = 64 bytes) 281 
and its alignment (minimum = 64-byte alignment) 

b: whether each defined prefetch variant is implemented (1) as a NOP, (2) with its 281, 284 
full semantics, or (3) with common-case prefetching semantics 

c: whether and how variants 16, 18, 19 and 24-31 are implemented; if not 285C 
implemented, a variant must execute as a NOP 


The following aspects of the PREFETCH and PREFETCHA instructions used to be (but 
are no longer) implementation dependent: 
d: while in nonprivileged mode (PSTATE.priv = 0), an attempt to reference an ASI in. — 
the range 04¢..7F1¢ by a PREFETCHA instruction executes as a NOP; specifically, 
it does not cause a privileged_action exception. 
e: PREFETCH and PREFETCHA have no observable effect in privileged code — 
g: while in privileged mode (PSTATE.priv = 1), an attempt to reference an ASI in the — 
range 30:46..7F16 by a PREFETCHA instruction executes as a NOP (specifically, it 
does not cause a privileged_action exception) 


105- f TICK register 72 


v9 a: If an accurate count cannot always be returned when TICK is read, any inaccuracy 
should be small, bounded, and documented. 
b: An implementation may implement fewer than 63 bits in TICK.counter; however, 
the counter as implemented must be able to count for at least 10 years without 
overflowing. Any upper bits not implemented must read as 0. 


106- f IMPDEP2A instructions 223 
v9 The IMPDEP2A instructions are completely implementation dependent. 
Implementation-dependent aspects include their operation, the interpretation of bits 
29:25 and 18:0 in their encodings, and which (if any) exceptions they may cause. 


107- f Unimplemented LDTW(A) trap 

v9 a: It is implementation dependent whether LDTW is implemented in hardware. If 250 
not, an attempt to execute an LDTW instruction will cause an 
unimplemented LDTW exception. 253 


b: It is implementation dependent whether LDTWA is implemented in hardware. If 
not, an attempt to execute an LDTWA instruction will cause an 
unimplemented LDTW exception. 


108- f Unimplemented STTW(A) trap 

v9 a: Itis implementation dependent whether STTW is implemented in hardware. If not, 334 
an attempt to execute an STTW instruction will cause an unimplemented STTW 
exception. 337 


b: It is implementation dependent whether STDA is implemented in hardware. If not, 
an attempt to execute an STTWA instruction will cause an unimplemented STTW 
exception. 
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TABLE B-1 SPARC V9 Implementation Dependencies (6 of 9) 








Nbr Category Description Page 
109- f LDDF(A) mem address not aligned 

V9- a: LDDF requires only word alignment. However, if the effective address is word- 102, 102, 
Cs10 aligned but not doubleword-aligned, an attempt to execute a valid (i = 1 or 237, 448 


instruction bits 12:5 = 0) LDDF instruction may cause an 

LDDF mem adaress not aligned exception. In this case, the trap handler software 
shall emulate the LDDF instruction and return. 

(In an UltraSPARC Architecture processor, the LDDF mem address not aligned 
exception occurs in this case and trap handler software emulates the LDDF 
instruction) 








b: LDDFA requires only word alignment. However, if the effective address is word- 240 
aligned but not doubleword-aligned, an attempt to execute a valid (i = 1 or 
instruction bits 12:5 = 0) LDDFA instruction may cause an 
LDDF mem adaress not aligned exception. In this case, the trap handler software 
shall emulate the LDDFA instruction and return. 
(In an UltraSPARC Architecture processor, the LDDF mem address not aligned 
exception occurs in this case and trap handler software emulates the LDDFA 











instruction) 
110- f STDF(A) mem address not aligned 
V9- a: STDF requires only word alignment in memory. However, if the effective address is 102, 
Cs10 word-aligned but not doubleword-aligned, an attempt to execute a valid (i=1or 321, 449 


instruction bits 12:5 = 0) STDF instruction may cause an 
STDF_mem_address_not_aligned exception. In this case, the trap handler software 
must emulate the STDF instruction and return. 

(In an UltraSPARC Architecture processor, the STDF mem address not aligned 
exception occurs in this case and trap handler software emulates the STDF 
instruction) 








b: STDFA requires only word alignment in memory. However, if the effective address 324 
is word-aligned but not doubleword-aligned, an attempt to execute a valid (i = 1 or 
instruction bits 12:5 = 0) STDFA instruction may cause an 
STDF mem adaress not aligned exception. In this case, the trap handler software 
must emulate the STDFA instruction and return. 

(In an UltraSPARC Architecture processor, the STDF mem address not aligned 
exception occurs in this case and trap handler software emulates the STDFA 
instruction) 











486 UltraSPARC Architecture 2005 + Draft DO.9.2, 19 Jun 2008 


TABLE B-1 


SPARC V9 Implementation Dependencies (7 of 9) 





Nbr 


111- 
V9- 
Cs10 


112- 
V9- 
Cs10 


Category Description 


f 


f 


LDQF(A)_mem_address_not_aligned 
a: LDQF requires only word alignment. However, if the effective address is word- 





aligned but not quadword-aligned, an attempt to execute an LDQF instruction may 
cause an LDQF_mem_address_not_aligned exception. In this case, the trap handler 
software must emulate the LDQF instruction and return. 

(In an UltraSPARC Architecture processor, the LDQF mem adaress not aligned 
exception occurs in this case and trap handler software emulates the LDOF 
instruction) 

(this exception does not occur in hardware on UltraSPARC Architecture 2005 
implementations, because they do not implement the LDQF instruction in 
hardware) 








: LDQFA requires only word alignment. However, if the effective address is word- 


aligned but not quadword-aligned, an attempt to execute an LDQFA instruction 
may cause an LDQF mem adaress not aligned exception. In this case, the trap 
handler software must emulate the LDOF instruction and return. 

(In an UltraSPARC Architecture processor, the LDQF mem adaress not aligned 
exception occurs in this case and trap handler software emulates the LDOFA 
instruction) 

(this exception does not occur in hardware on UltraSPARC Architecture 2005 
implementations, because they do not implement the LDQFA instruction in 
hardware) 








STQF(A) mem address not aligned 
a: STQF requires only word alignment in memory. However, if the effective address is 103, 





word aligned but not quadword aligned, an attempt to execute an STOF instruction 
may cause an STQF mem adaress not aligned exception. In this case, the trap 
handler software must emulate the STQF instruction and return. 

(In an UltraSPARC Architecture processor, the STQF mem adaress not aligned 
exception occurs in this case and trap handler software emulates the STOF 
instruction) 

(this exception does not occur in hardware on UltraSPARC Architecture 2005 
implementations, because they do not implement the STOF instruction in 
hardware) 








: STOFA requires only word alignment in memory. However, if the effective address 


is word aligned but not quadword aligned, an attempt to execute an STOFA 
instruction may cause an STQF mem adaress not aligned exception. In this case, 
the trap handler software must emulate the STOFA instruction and return. 

(In an UltraSPARC Architecture processor, the STQF mem adaress not aligned 
exception occurs in this case and trap handler software emulates the STOFA 
instruction) 

(this exception does not occur in hardware on UltraSPARC Architecture 2005 
implementations, because they do not implement the STOFA instruction in 
hardware) 








Page 


103,102, 
237, 450 


240 


322, 450 


324 
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TABLE B-1 SPARC V9 Implementation Dependencies (8 of 9) 
Nbr Category Description Page 
113- f Implemented memory models 91, 388 
V9- Whether memory models represented by PSTATE.mm = 10, or 11, are supported in an 
Ms10 UltraSPARC Architecture processor is implementation dependent. If the 10, model is 
supported, then when PSTATE.mm = 10 the implementation must correctly execute 
software that adheres to the RMO model described in The SPARC Architecture Manual- 
Version 9. If the 115 model is supported, its definition is implementation dependent. 
118- f Identifying I/O locations 380 
V9 The manner in which I/O locations are identified is implementation dependent. 
119- f Unimplemented values for PSTATE.mm 91, 389 
Ms10 The effect of an attempt to write an unsupported memory model designation into 
PSTATE.mm is implementation dependent; however, it should never result in a value 
of PSTATE.mm value greater than the one that was written. In the case of an 
UItraSPARC Architecture implementation that only supports the TSO memory model, 
PSTATE.mm always reads as zero and attempts to write to it are ignored. 
120- f Coherence and atomicity of memory operations 380 
V9 The coherence and atomicity of memory operations between virtual processors and 
I/O DMA memory accesses are implementation dependent. 
121- f Implementation-dependent memory model 380 
v9 An implementation may choose to identify certain addresses and use an 
implementation-dependent memory model for references to them. 
122- f FLUSH latency 174, 396 
v9 The latency between the execution of FLUSH on one virtual processor and the point at 
which the modified instructions have replaced outdated instructions in a 
multiprocessor is implementation dependent. 
123- f Input/output (I/O) semantics 27 
v9 The semantic effect of accessing I/O registers is implementation dependent. 
124- v Implicit ASI when TL > 0 383 
v9 In SPARC V9, when TL > 0, the implicit ASI for instruction fetches, loads, and stores is 
implementation dependent. In all UltraSPARC Architecture implementations, when 
TL > 0, the implicit ASI for instruction fetches is ASI. NUCLEUS; loads and stores will 
use ASI_NUCLEUS if PSTATE.cle = 0 or AST_NUCLEUS_LITTLE if PSTATE.cle = 1. 
125- f Address masking 93, 93, 
V9- (1) When PSTATE.am = 1, only the less-significant 32 bits of the PC register are stored 150,226, 
Cs10 in the specified destination register(s) in CALL, JMPL, and RDPC instructions, while 288, 443 


the more-significant 32 bits of the destination registers(s) are set to 0. 

((2) When PSTATE.am = 1, during a trap, only the less-significant 32 bits of the PC and 
NPC are stored (respectively) to TPC[TL] and TNPC[TL]; the more-significant 32 bits 
of TPC[TL] and TNPC[TL] are set to 0. 
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TABLE B-1 SPARC V9 Implementation Dependencies (9 of 9) 





Nbr Category Description Page 
126- Register Windows State registers width 82 
V9- Privileged registers CWP, CANSAVE, CANRESTORE, OTHERWIN, and CLEANWIN 

Ms10 contain values in the range 0 to N REG WINDOWS — 1. An attempt to write a value 


greater than N_REG_ WINDOWS — 1 to any of these registers causes an implementation- 
dependent value between 0 and N REG WINDOWS — 1 (inclusive) to be written to the 
register. Furthermore, an attempt to write a value greater than N REG WINDOWS — 2 
violates the register window state definition in Register Window Management 
Instructions on page 116. 

Although the width of each of these five registers is architecturally 5 bits, the width is 
implementation dependent and shall be between Mog,(N_REG_WINDOWS) | and 5 bits, 
inclusive. If fewer than 5 bits are implemented, the unimplemented upper bits shall 
read as 0 and writes to them shall have no effect. All five registers should have the 
same width. 

For UltraSPARC Architecture 2005 processors, N_REG_WINDOWS = 8. Therefore, each 
register window state register is implemented with 3 bits, the maximum value for 
CWP and CLEANWIN is 7, and the maximum value for CANSAVE, CANRESTORE, 
and OTHERWIN is 6. When these registers are written by the WRPR instruction, bits 
63:3 of the data written are ignored. 


127-199 Reserved. — 


TABLE B-2 provides a list of implementation dependencies that, in addition to those 
in TABLE B-1, apply to UltraSPARC Architecture processors. Bold face indicates the 
main page on which the implementation dependency is described. See Appendix C 
in the Extensions Documents for further information. 


TABLE B-2 UltraSPARC Architecture Implementation Dependencies (1 of 6) 
Nbr Description Page 





200-201 Reserved. — 
203-U3- Dispatch Control register (DCR) bits 13:6 and 1 


Cs10 This implementation dependency no longer applies, as of UltraSPARC Architecture 2005. 
204-U3- DCR bits 5:3 and 0 
CS10 This implementation dependency no longer applies, as of UltraSPARC Architecture 2005. 
205-U3- Instruction Trap Register 
Cs10 This implementation dependency no longer applies, as of UltraSPARC Architecture 2005. 
206-U3- SHUTDOWN instruction 307 
Cs10 On an UltraSPARC Architecture implementation executing in privileged mode, 
SHUTDOWN behaves like a NOP. 
207-U3 PCR register bits 47:32, 26:17, and 3 75 


The values and semantics of bits 47:32, 26:17, and bit 3 of the PCR register are 
implementation dependent. 


208-U3 Ordering of errors captured in instruction execution — 
The order in which errors are captured in instruction execution is implementation 
dependent. Ordering may be in program order or in order of detection. 
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TABLE B-2 UltraSPARC Architecture Implementation Dependencies (2 of 6) 





Nbr 
209-U3 


211-U3 


212-U3- 
Cs10 


213-U3 


228-U3- 
Cs10 


229-U3- 
Cs10 


230 
230-U3 


232-U3- 
Cs10 


233-U3- 
Cs10 


235-U3- 
Cs10 


236-U3- 
Cs10 


239-U3- 
Cs10 


240-U3- 
Cs10 


243-U3 


Description Page 


Software intervention after instruction-induced error — 
Precision of the trap to signal an instruction-induced error of which recovery requires 
software intervention is implementation dependent. 


Error logging registers' information = 
The information that the error logging registers preserves beyond the reset induced by an 
ERROR signal is implementation dependent. 


Trap with fatal error — 
This implementation dependency no longer applies, as of UltraSPARC Architecture 2005. 


AFSR . priv — 
The existence of the AFSR . priv bit is implementation dependent. If AFSR. priv is 
implemented, it is implementation dependent whether the logged AFSR. priv indicates the 
privileged state upon the detection of an error or upon the execution of an instruction that 
induces the error. For the former implementation to be effective, operating software must 
provide error barriers appropriately. 


This implementation dependency no longer applies, as of UltraSPARC Architecture 2005. = 


This implementation dependency no longer applies, as of UltraSPARC Architecture 2005.TSB = — 
Base address generation 

Whether the implementation generates the TSB Base address by exclusive-ORing the TSB 
Base register and a TSB register or by taking the tsb_base field directly from a TSB register 

is implementation dependent in UltraSPARC Architecture. This implementation 

dependency existed for UltraSPARC III/IV, only to maintain compatibility with the TLB 
miss handling software of UltraSPARC I/II. 


Reserved. — 


data access exception trap — 
The causes of a data access exception trap are implementation dependent in UltraSPARC 
Architecture 2005. 

This implementation dependency no longer applies, as of UltraSPARC Architecture 2005. = 
This implementation dependency no longer applies, as of UltraSPARC Architecture 2005. — 
This implementation dependency no longer applies, as of UltraSPARC Architecture 2005. = 
This implementation dependency no longer applies, as of UltraSPARC Architecture 2005.t — 
This implementation dependency no longer applies, as of UltraSPARC Architecture 2005. 


Reserved. — 


This implementation dependency no longer applies, as of UltraSPARC Architecture 2005. — 
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TABLE B-2 UltraSPARC Architecture Implementation Dependencies (3 of 6) 





Nbr Description Page 
244-U3- | Data Watchpoint Reliability — 
Cs10 Data Watchpoint traps are completely implementation-dependent in UltraSPARC 


Architecture processors. 


245-U3- This implementation dependency no longer applies, as of UltraSPARC Architecture 2005. — 
Cs10 


248-U3 Conditions for fp_exception_other with unfinished_FPop 62 
The conditions under which an fp_exception_other exception with floating-point trap type 
of unfinished_FPop can occur are implementation dependent. An implementation may 
cause fp exception other with unfinished_FPop under a different (but specified) set of 


conditions. 
249-U3- | Data Watchpoint for Partial Store Instruction 331 
Cs10 For an STPARTIAL instruction, the following aspects of data watchpoints are 


implementation dependent: (a) whether data watchpoint logic examines the byte store 
mask in R[rs2] or it conservatively behaves as if every Partial Store always stores all 8 
bytes, and (b) whether data watchpoint logic examines individual bits in the Virtual 
(Physical) Data Watchpoint Mask in the LSU Control register to determine which bytes are 
being watched or (when the Watchpoint Mask is nonzero) it conservatively behaves as if 
all 8 bytes are being watched. 


250-U3- PCR accessibility when PSTATE.priv = 0 74, 289, 
Cs10 In an UltraSPARC Architecture implementation, PCR is never accessible to nonprivileged 360 
software. Specifically, when a virtual processor is operating in nonprivileged mode 
(PSTATE.priv = 0), an attempt to access PCR (using an RDPCR or a WRPCR instruction) 
results in a privileged opcode exception. 


251 Reserved. 


252-U3- Thisimplementation dependency no longer applies, as of UltraSPARC Architecture 2005. — 
Cs10 


253-U3-  Thisimplementation dependency no longer applies, as of UltraSPARC Architecture 2005. — 
Cs10 


257-U3 LDDFA with ASI C046-C5,6 or C816-CD:16 and misaligned memory address 241 
If an LDDFA opcode is used with an ASI of C016-C546 or C846-CD4g (Partial Store ASIs, 
which are an illegal combination with LDDFA) and a memory address is specified with 
less than 8-byte alignment, the virtual processor generates n exception. It is 
implementation dependent whether the exception generated is data access exception, 
mem adaress not aligned, or LDDF mem adaress not aligned. 


259-299 Reserved. "S 





300-U4- Attempted access to ASI registers with LDTWA 254 
Cs10 If an LDTWA instruction referencing a non-memory ASI is executed, it generates a 

dala access exception exception. 
301-U4- Attempted access to ASI registers with STTWA 337 
Cs10 If an STTWA instruction referencing a non-memory ASI is executed, it generates a 


dala access exception exception. 
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TABLE B-2 UltraSPARC Architecture Implementation Dependencies (4 of 6) 





Nbr 


302-U4- 
Cs10 


303-U4- 
CS10 


305-U4- 
Cs10 


306-U4- 
Cs10 


307-U4- 
Cs10 


308-U3- 
Cs10 


309-U4- 
Cs10 


311-319 
327-399 
400-S10 


401-S10 


403-S10 


404-S10 


Description Page 


Scratchpad registers 417 
An UltraSPARC Architecture processor includes eight privileged Scratchpad registers (64 
bits each, read/write accessible). 


This implementation dependency no longer applies, as of UltraSPARC Architecture 2005. = 


Thisimplementation dependency no longer applies, as of UltraSPARC Architecture 2005. m 


Trap type generated upon attempted access to noncacheable page with LDTXA 256 
When an LDTXA instruction attempts access from an address that is not mapped to 
cacheable memory space, a dala access exception exception is generated. 


Thisimplementation dependency no longer applies, as of UltraSPARC Architecture 2005. und 
Thisimplementation dependency no longer applies, as of UltraSPARC Architecture 2005. ER 
Reserved. = 


Reserved. 
Reserved 


Global Level register (GL) implementation 96 
Although GL is defined as a 4-bit register, an implementation may implement any subset 

of those bits sufficient to encode the values from 0 to MAXPGL for that implementation. If 
any bits of GL are not implemented, they read as zero and writes to them are ignored. 


Maximum Global Level (MAXPGL) 94, 96 
The architectural parameter MAXPGL is a constant for each implementation; its legal values 

are from 2 to 15 (supporting from 3 to 16 sets of global registers). In a typical 

implementation MAXPGL = MAXPTL (see impl. dep. #101-V9-CS10). 

Architecturally, MAXPTL must be > 2. 


Setting of “dirty” bits in FPRS 74, 74 
A “dirty” bit (du or dl) in the FPRS register must be set to ‘1’ if any of its corresponding F 
registers is actually modified. The specific conditions under which a dirty bit is set are 
implementation dependent. 


Scratchpad registers 4 through 7 417 
The degree to which Scratchpad registers 4-7 are accessible to privileged software is 
implementation dependent. Each may be (1) fully accessible, (2) accessible, with access 

much slower than to scratchpad register 0-3, or (3) inaccessible (cause a 
data_access_exception exception). 
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TABLE B-2 UltraSPARC Architecture Implementation Dependencies (5 of 6) 





Nbr Description Page 


405-S10 Virtual address range 26 
An UltraSPARC Architecture implementation may support a full 64-bit virtual address 
space or a more limited range of virtual addresses. In an implementation that does not 
support a full 64-bit virtual address space, the supported range of virtual addresses is 
restricted to two equal-sized ranges at the extreme upper and lower ends of 64-bit 
addresses; that is, for n-bit virtual addresses, the valid address ranges are 0 to 2"-1_4 and 
264 — 27-1 to 264 — 1. 


409-S10- FLUSH instruction and memory consistency 175 
Cs20 The implementation of the FLUSH instruction is implementation dependent. 
If the implementation automatically maintains consistency between instruction and data 
memory, 


(1) the FLUSH address is ignored and 

(2) the FLUSH instruction cannot cause any data access exceptions, because its effective 
address operand is not translated or used by the MMU. 

On the other hand, if the implementation does not maintain consistency between 

instruction and data memory, the FLUSH address is used to access the MMU and the 

FLUSH instruction can cause data access exceptions. 


410-810 Block Load behavior 

The following aspects of the behavior of block load (LDBLOCKF) instructions are 233 

implementation dependent: 

+ What memory ordering model is used by LDBLOCKF (LDBLOCKF is not required to 
follow TSO memory ordering) 

* Whether LDBLOCKF follows memory ordering with respect to stores (including block 
stores), including whether the virtual processor detects read-after-write and write-after- 
read hazards to overlapping addresses 

* Whether LDBLOCKF appears to execute out of order, or follow LoadLoad ordering 
(with respect to older loads, younger loads, and other LDBLOCKFs) 

* Whether LDBLOCKF follows register-dependency interlocks, as do ordinary load 
instructions 

* Whether LDBLOCKFs to non-cacheable locations are 
(a) strictly ordered, 

(b) not strictly ordered and cause an data access exception exception, or 
(c) not strictly ordered and silently execute without causing an exception (option (c) is 
strongly discouraged) 


* Whether the MMU ignores the side-effect bit (TTE.e) for LDBLOCKF accesses 380 
(in which case, LDBLOCKFs behave as if TTE.e = 0) 


* Whether VA watchpoint exceptions are recognized on accesses to all 64 bytes of a 234, 234 
LDBLOCKF (the recommended behavior), or only on accesses to the first eight bytes 
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TABLE B-2 UltraSPARC Architecture Implementation Dependencies (6 of 6) 





Nbr Description Page 


411-S10 Block Store behavior 319, 319 
The following aspects of the behavior of block store (STBLOCKF) instructions are 
implementation dependent: 

* The memory ordering model that STBLOCKF follows (other than as constrained by the 
rules outlined on page 319). 

* Whether VA watchpoint exceptions are recognized on accesses to all 64 bytes of a 
STBLOCKF (the recommended behavior), or only on accesses to the first eight bytes. 

* Whether STBLOCKFs to non-cacheable pages execute in strict program order or not. If 
not, a STBLOCKF to a non-cacheable page causes a data access exception exception. 

* Whether STBLOCKF follows register dependency interlocks (as ordinary stores do). 

* Whether a non-Commit STBLOCKT forces the data to be written to memory and 
invalidates copies in all caches present (as the Commit variants of STBLOCKF do). 


* Whether the MMU ignores the side-effect bit (TTE.e) for STBLOCKF accesses 380 
(in which case, STBLOCKFs behave as if TTE.e = 0) 


* Any other restrictions on the behavior of STBLOCKF, as described in implementation- 
specific documentation. 


412-810  MEMBAR behavior 262 
An UltraSPARC Architecture implementation may define the operation of each MEMBAR 
variant in any manner that provides the required semantics. 


413-S10 Load Twin Extended Word behavior 256 
It is implementation dependent whether VA_watchpoint exceptions are recognized on 
accesses to all 16 bytes of a LDTXA instruction (the recommended behavior) or only on 
accesses to the first 8 bytes. 


414 Reserved. — 


417-S10 Behavior of DONE and RETRY when TSTATE[TL].pstate.am = 1 93, 154296 
If (1) TSTATE[TL].pstate.am = 1 and (2) a DONE or RETRY instruction is executed (which 
sets PSTATE.am to ‘1’ by restoring the value from TSTATE[TL].pstate.am to PSTATE.am), 
it is implementation dependent whether the DONE or RETRY instruction masks (zeroes) 
the more-significant 32 bits of the values it places into PC and NPC. 
442-810 STICK register 81 


a: If an accurate count cannot always be returned when STICK is read, any inaccuracy 
should be small, bounded, and documented. 


b: An implementation may implement fewer than 63 bits in STICK.counter; however, the 
counter as implemented must be able to count for at least 10 years without 
overflowing. Any upper bits not implemented must read as 0. 


444—449 Reserved for UltraSPARC Architecture 2005 


450 Reserved for future use 
and up 


450-499 Reserved for UltraSPARC Architecture 2007 





494 UltraSPARC Architecture 2005 + Draft DO.9.2, 19 Jun 2008 


APPENDIX C 





Assembly Language Syntax 


This appendix supports Chapter 7, Instructions. Each instruction description in 
Chapter 7 includes a table that describes the suggested assembly language format 
for that instruction. This appendix describes the notation used in those assembly 
language syntax descriptions and lists some synthetic instructions provided by 
UltraSPARC Architecture assemblers for the convenience of assembly language 
programmers. 


The appendix contains these sections: 


m Notation Used on page 495. 
m Syntax Design on page 501. 
m Synthetic Instructions on page 502. 





C.1 Notation Used 


The notations defined here are also used in the assembly language syntax 
descriptions in Chapter 7, Instructions. 


Items in typewriter font are literals to be written exactly as they appear. Items 
in italic font are metasymbols that are to be replaced by numeric or symbolic values 
in actual SPARC V9 assembly language code. For example, “imm_asi” would be 
replaced by a number in the range 0 to 255 (the value of the imm_asi bits in the 
binary instruction) or by a symbol bound to such a number. 


Subscripts on metasymbols further identify the placement of the operand in the 
generated binary instruction. For example, reg, is a reg (register name) whose 
binary value will be placed in the rs2 field of the resulting instruction. 


495 


C4. 


Register Names 


reg. A reg is an integer register name. It can have any of the following values:! 
$r0-$r31 
%g0-%g7 (global registers; same as $r0- $r 7) 
$00—$07 (out registers; same as $r8-$r15) 
%10-%17 (local registers; same as $r16-%r23) 
$10-$i7 (in registers; same as $r24-$r31) 
Sfp (frame pointer; conventionally same as $16) 
Ssp (stack pointer; conventionally same as %06) 


Subscripts identify the placement of the operand in the binary instruction as one of 
the following: 


TES rg1 (rs1 field) 
TES rs? (rs2 field) 
TÉ rq (rd field) 


freg. An freg is a floating-point register name. It may have the following values: 
GEO, SF1, S£2,... £31 
$£32,%f34, ..$£60, £62 (even-numbered only, from $£32 to $£62) 
d0, %d2, $d4,... $d60, $d62 (san, where n mod 2 = 0, only) 
$q0, $q4, $q8, .. $156, $q60  ($qn, where n mod 4 = 0, only) 


o oe 


o 


See Floating-Point Registers on page 52 for a detailed description of how the 
single-precision, double-precision, and quad-precision floating-point registers 
overlap. 


Subscripts further identify the placement of the operand in the binary instruction as 
one of the following: 

freSrs1 (rs1 field) 

fregrso (rs2 field) 

freS +53 (rs3 field) 

ÎTES rd (rd field) 


asr reg. Anasr_reg is an Ancillary State Register name. It may have one of the 
following values: 
Sasrl6-$asr31 


Subscripts further identify the placement of the operand in the binary instruction as 
one of the following: 


asr_regrs1 (rs1 field) 
asr regyg (rd field) 


L In actual usage, the sp, Sfp, sgn, ton, $1n, and Sin forms are preferred over srn. 
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i or x cc. Ani or x cc specifies a set of integer condition codes, those based on 
either the 32-bit result of an operation (icc) or on the full 64-bit result (xcc). It may 
have either of the following values: 

$icc 

$xcc 


fccn. An £ccn specifies a set of floating-point condition codes. It can have any of 
the following values: 


$fccO 
$fccl 
$fcc2 
$fcc3 


C2 Special Symbol Names 


Certain special symbols appear in the syntax table in typewriter font. They must be 
written exactly as they are shown, including the leading percent sign (%). 


The symbol names and the registers or operators to which they refer are as follows: 








Sasi Address Space Identifier (ASI) register 
$canrestore Restorable Windows register 

$cansave Savable Windows register 

$ccr Condition Codes register 

$cleanwin Clean Windows register 

Scwp Current Window Pointer (CWP) register 
Sfprs Floating-Point Registers State (FPRS) register 
$fsr Floating-Point State register 

$gsr General Status Register (GSR) 
$otherwin Other Windows (OTHERWIN) register 
$pc Program Counter (PC) register 

$pcr Performance Control Register (PCR) 
Spic Performance Instrumentation Counters 
$pil Processor Interrupt Level register 
$pstate Processor State register 

$softint Soft Interrupt register 

$softint clr Soft Interrupt register (clear selected bits) 
$softint set Soft Interrupt register (set selected bits) 
$stickt System Timer (STICK) register 

$stick cmpr t System Timer Compare (STICK CMPR) register 
$tba Trap Base Address (TBA) register 

Stick Cycle count (TICK) register 
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Stick_cmpr 
Stl 

Stnpc 

Stpc 
ststate 
$tt 
$wstate 


SY 


Timer Compare (TICK_CMPR) register 

Trap Level (TL) register 

Trap Next Program Counter (TNPC) register 
Trap Program Counter (TPC) register 

Trap State (TSTATE) register 

Trap Type (TT) register 

Window State register 

Y register 


t The original assembly language names for stick and %stick_cmpr were, respectively, $sys tick and 
$sys tick cmpr, which are now deprecated. Over time, assemblers will support the new $stick and 
$stick cmpr names for these registers (which are consistent with Stick and %tick_cmpr). In the 
meantime, some existing assemblers may only recognize the original names. 


The following special symbol names are prefix unary operators that perform the 
functions described, on an argument that is a constant, symbol, or expression that 
evaluates to a constant offset from a symbol: 


shh 


shm 


Shi or 1m 


Slo 


Extracts bits 63:42 (high 22 bits of upper word) of its operand 
Extracts bits 41:32 (low-order 10 bits of upper word) of its 
operand 

Extracts bits 31:10 (high-order 22 bits of low-order word) of 
its operand 


Extracts bits 9:0 (low-order 10 bits) of its operand 


For example, the value of "$10 (symbol)" is the least-significant 10 bits of symbol. 


Certain predefined value names appear in the syntax table in typewriter font. 
They must be written exactly as they are shown, including the leading sharp sign 
(#). The value names and the constant values to which they are bound are listed in 


TABLE C-1. 


TABLE C-1 Value Names and Values (1 of 2) 


Value Name in Assembly Language 


Value Comments 


for PREFETCH instruction “fcn” field 


n_reads 
one_read 
n_writes 
one_write 

page 

unified 
n_reads_strong 


one_read_strong 








n_writes_strong 


0 
1 
2 
3 
4 
17 (13g) 
20 (1416) 
21 (1516) 
22 (1616) 
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TABLE C-1 Value Names and Values (2 of 2) 


Value Name in Assembly Language Value Comments 





one_write_strong 23 (1716) 


for MEMBAR instruction “mmask” field 





LoadLoad 0116 
StoreLoad 0216 
LoadStore 0446 


for MEMBAR instruction "cmask" field 








StoreStore 0816 
Lookaside 1016 
MemIssue 2016 
Sync 4016 


C.1.3 Values 


Some instructions use operand values as follows: 


const4 A constant that can be represented in 4 bits 

const22 A constant that can be represented in 22 bits 

imm, asi An alternate address space identifier (0-255) 

siam, mode A 3-bit mode value for the SIAM instruction 

simm7 A signed immediate constant that can be represented in 7 bits 
simm8 A signed immediate constant that can be represented in 8 bits 
simm10 A signed immediate constant that can be represented in 10 bits 
simm11 A signed immediate constant that can be represented in 11 bits 
simm13 A signed immediate constant that can be represented in 13 bits 
value Any 64-bit value 

shcnt32 A shift count from 0-31 

shcnt64 A shift count from 0-63 


C.1.4 Labels 


A label is a sequence of characters that comprises alphabetic letters (a-z, A-Z [with 
upper and lower case distinct]), underscores ( ), dollar signs ($), periods (.), and 
decimal digits (0-9). A label may contain decimal digits, but it may not begin with 
one. A local label contains digits only. 
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ECTS Other Operand Syntax 


Some instructions allow several operand syntaxes, as follows: 


reg_plus_imm Can be any of the following: 


TÉ rg (equivalent to reg;s; + $90) 
Tegrg, + simm13 

regs — simm13 

simm13 (equivalent to $g0 + simm13) 
simm13 + reg;s, (equivalent to reg, + simm13) 


address Can be any of the following: 


TES rg1 (equivalent to reg,s; + $g0) 
Tegrg, + simm13 

regs; — simm13 

simm13 (equivalent to $g0 + simm13) 
simm13 + reg;s, (equivalent to reg,,4 + simm13) 


Teg ps1 + Yegrs2 


membar mask Is the following: 


const7 A constant that can be represented in 7 bits. Typically, this is an 
expression involving the logical OR of some combination of 
#Lookaside, #MemIssue, 4 Sync, #StoreStore, #LoadStore, 
#StoreLoad, and #LoadLoad (see TABLE 7-7 and TABLE 7-8 on 
page 261 for a complete list of mnemonics). 


prefetch_fcn (prefetch function) Can be any of the following: 
0-31 
Predefined constants (the values of which fall in the 0-31 range) useful as 
prefetch_fcn values can be found in TABLE C-1 on page 498. 


regaddr (register-only address) Can be any of the following: 


TES rg1 (equivalent to reg,s; + 590) 


re8rs1 + TS rs2 


reg or imm (register or immediate value) Can be either of: 


TES rs2 
simm13 
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reg or imm10 (register or immediate value) Can be either of: 


TCS 152 
simm10 


reg_or_imm11 (register or immediate value) Can be either of: 


TeSrs2 
simm11 


reg_or_shcnt (register or shift count value) Can be any of: 


TER rs2 
shcnt32 


shcnt64 


software_trap_number Can be any of the following: 


TES rs] (equivalent to reg,s1 + 90) 
reSrst + V8 rs2 

TS rs1 + simms 

TES rg, — simms 

simm8& (equivalent to 5g0 + simm8) 
simm8 + reg,s1 (equivalent to reg,s4 + simm8) 


The resulting operand value (software trap number) must be in the range 0-255, 
inclusive. 


C.1.6 Comments 


Two types of comments are accepted by the SPARC V9 assembler: C-style "/*...*/ 
" comments, which may span multiple lines, and "! . . .” comments, which extend 
from the ^!" to the end of the line. 





C.2 Syntax Design 


The SPARC V9 assembly language syntax is designed so that the following 
statements are true: 


m The destination operand (if any) is consistently specified as the last (rightmost) 
operand in an assembly language instruction. 
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m A reference to the contents of a memory location (for example, in a load, store, or 
load-store instruction) is always indicated by square brackets ([]); a reference to 
the address of a memory location (such as in a JMPL, CALL, or SETHI) is specified 
directly, without square brackets. 





C.3 Synthetic Instructions 


TABLE C-2 describes the mapping of a set of synthetic (or “pseudo”) instructions to 
actual instructions. These synthetic instructions are provided by the SPARC V9 
assembler for the convenience of assembly language programmers. 


Note: Synthetic instructions should not be confused with “pseudo ops,” which 
typically provide information to the assembler but do not generate instructions. 
Synthetic instructions always generate instructions; they provide more mnemonic 
syntax for standard SPARC V9 instructions. 


TABLE C-2 Mapping Synthetic to SPARC V9 Instructions (1 of 3) 


Synthetic Instruction SPARC V9 Instruction(s) Comment 

cmp Tégrg1, YEQ_or_imm ^ subcc reQygq, reg or imm, %g0 Compare. 

jmp address jmpl address, %g0 

call address jmpl address, %07 

iprefetch label bn,a,pt %xcc,label Originally envisioned as an 


encoding for an "instruction 
prefetch" operation, but 
functions as a NOP on all 
UItraSPARC Architecture 
implementations. ( See 
PREFETCH function 17 on 
page 280 for an alternative 
method of prefetching 
instructions.) 


tst Te rg1 orcc $gO0, regrs1, BGO Test. 

ret jmpl $i17-8, %g0 Return from subroutine. 
retl jmpl $07-8, %g0 Return from leaf subroutine. 
restore restore %g0, $g0, $g0 Trivial RESTORE. 

save save $g0, $g0, %g0 Trivial SAVE. 


(Warning: trivial SAVE should 
only be used in kernel code!) 


setuw value, reg rq sethi Shi(value) , regrg (When ((value&3FF;6) == 0).) 
= 
or $g0, value, reg, (When 0 < value < 4095). 
= 
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TABLE C-2 


Synthetic Instruction 


set 


setsw 


setx 


signx 
signx 
not 
not 
neg 
neg 
cas 
casl 
casx 


casxl 


value, reg rg 


value, regra 


value, reg, regyg 


TESrs1r "8rd 

"eS rd 

Tegrg1: l'€8rd 

leg rd 

TESrs2r 8rd 

"eS rd 

[reg rs1], regrs2, regra 
[regrs i], re8rs2 regra 
[regrsil, l'egrs2, TES rf 
[regrsil, l'egrs2, TES rf 


Mapping Synthetic to SPARC V9 Instructions (2 of 3) 


SPARC V9 Instruction(s) 


sethi 


or 


sethi 


Sra 


Shi(value) , regrgi 


reSrqr Slo(value) , reg, 


Shi(value) , regra 


I-Ópes 
$g0, value, reg;g 
Lor 


Shi(value) , regrg 


"eS rdr %90, re21g 
—or— 

Shi(value) , regrgi 

re rq, Slo(value) , reg, 
—or— 

Shi(value) , regrgi 


re rqr Slo(value) , regi 


Tegrg, $90, regyg 


$hh (value) , reg 

reg, shm (value), reg 
reg, 32,reg 

Shi (value), reg, 


TES rdr TES VS rd 
VS rqr S10 (value) , regra 


Tégrg1, $90, Trg 

Tégyg, %90, re21g 

Tégrg1, S90, Tegrg 

Tégyg, %90, re21g 

690, re2,s2, Trg 

$g0, TeQrdr legrg 
[regrs1]1* ASI, P, regreso, regra 
[reg;s1]* ASI. P. L, reg;s2r regrg 
[reg;s1]* ASI. P, reg;go, Tegra 
[reg;s1]t ASI. P. L, reg,s2r regrg 


Comment 
(Otherwise) 


Warning: do not use setuw in 
the delay slot of a DCTI. 


synonym for setuw. 


(When (value> = 0) and 
((value & 3FF;jg) == 0).) 


(When 4096 x value < 4095). 


(Otherwise, if (value « 0) and 
((value & 3FF16) = =0)) 


(Otherwise, if value 0) 


(Otherwise, if value < 0) 


Warning: do not use set sw in 
the delay slot of a CTI. 


Create 64-bit constant. 


("reg" is used as a temporary 
register.) 


Note: setx optimizations are 
possible but not enumerated 
here. The worst case is shown. 
Warning: do not use setx in the 
delay slot of a CTI. 


Sign-extend 32-bit value to 
64 bits. 


One's complement. 
One's complement. 
Two's complement. 
Two's complement. 


Compare and swap. 


Compare and swap, little-endian. 


Compare and swap extended. 


Compare and swap extended, 
little-endian. 
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TABLE C-2 


Synthetic Instruction 


inc 
inc 
inece 
inccc 
dec 
dec 
deccc 
decec 
btst 
bset 





TeS rd 
const13, reg rd 


TeS rd 
const13, regrg 


TeS rd 
const13, regrg 


"eS rd 

const13, regrg 
reg or imm, Tegrs1 
reg or imm, Tegra 
reg or imm, Tegra 
reg or imm, Tegra 
"eS rd 

[address] 

[address] 

[address] 

[address] 


TERrs1r TS rd 

TES rd 

reg or imm, Tegra 
SY, TES rq 

Sasrn, legyg 

reg or imm, %y 


reg or imm, $asrn 


Mapping Synthetic to SPARC V9 Instructions (3 of 3) 


SPARC V9 Instruction(s) 

add Tegrg, 1, legyg 

add Tegra COnst13, regra 
addcc Tegrg, 1, legrg 
addcc fegra const13, regra 
sub Tegrg, 1, legyg 

sub Tegrg, COnst13, regrg 
subcc Tegra, 1, egrg 


subcc  regrgr const13, regra 


andcc Tersi, leg or imm, $ 


or Teg,g, leg or imm, TESrd 
andn Tegrg, leg or imm, TESrd 
xor Teg,g, leg or imm, TESrd 
or $g0, $g0, reg;q 

stb %g0, [address] 

sth %g0, [address] 

stw %g0, [address] 

stx %g0, [address] 

srl T€egrs1; $90, Tegrg 

srl Tegrg, $90, regrg 

or $g0, reg or imm, regrg 
rd SY, TES rq 

rd Sasrn, ler 

wr $gO0, reg or imm, %y 
wr $gO0, reg or imm, $asrn 
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Comment 

Increment by 1. 

Increment by const13. 
Increment by 1; set icc & xcc. 
Incr by const13; set icc & xcc. 
Decrement by 1. 

Decrement by const13. 
Decrement by 1; set icc & xcc. 
Decr by const13; set icc & xcc. 
Bit test. 

Bit set. 

Bit clear. 

Bit toggle. 

Clear (zero) register. 

Clear byte. 

Clear half-word. 

Clear word. 

Clear extended word. 

Copy and clear upper word. 


Clear upper word. 
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A definition, 7 
a (annul) instruction field encoding address space information, 101 
branch instructions, 142, 143, 145, 148, 162, 165 explicit, 108 
accesses explicitly specified in instruction, 108 
cacheable, 379 implicit, See implicit ASIs 
I/O, 379 nontranslating, 12, 254, 337 
restricted ASI, 383 nontranslating ASI, 400 
with side effects, 379, 390 with prefetch instructions, 281 
accrued exception (aexc) field of FSR register, 63, real ASI, 400 
432, 482 restricted, 383, 399 
ADD instruction, 134 privileged, 383 
ADDC instruction, 134 restriction indicator, 71 
ADDcc instruction, 134, 310 SPARC V9 address, 381 
ADDCcc instruction, 134 translating ASI, 400 
address unrestricted, 383, 399 
operand syntax, 500 address space identifier (ASI) register 
space identifier (ASI), 399 for load/store alternate instructions, 71 
address mask (am) field of PSTATE register address for explicit ASI, 108 
description, 92 and LDDA instruction, 239, 252 
address space, 7, 20 and LDSTUBA instruction, 248 
address space identifier (ASI), 7, 378 load integer from alternate space 
accessing MMU registers, 467 instructions, 229 
appended to memory address, 25, 100 with prefetch instructions, 281 
architecturally specified, 383 for register-immediate addressing, 383 
changed in, 418 restoring saved state, 154, 296 
changed in UA saving state, 423 
ASI_REAL, 418 and STDA instruction, 336 
ASI_REAL_IO, 418 store floating-point into alternate space 
ASI REAL IO LITTLE, 418 instructions, 323 
ASI REAL LITTLE, 418 store integer to alternate space instructions, 314 
ASI TWINX N, 418 and SWAPA instruction, 343 
ASI TWINX NL, 418 after trap, 30 
ASI TWINX NUCLEUS LITTLE, 418 and TSTATE register, 88 




















and write state register instructions, 359 










































































































































































addressing modes, 20 ASI_AIUPL, 402, 411 
ADDX instruction (SPARC V8), 134 ASI_AIUS, 402, 410 
ADDxcc instruction (SPARC V8), 134 ASI AIUS L, 255 
alias ASI AIUSL, 402,411 
floating-point registers, 52 ASI AS IF USER*, 92 
aliased, 7 ASI AS IF USER NONFAULT, LI E, 384 
ALIGNADDRESS instruction, 135 ASI AS IF USER PRIMARY, 402,410 
ALIGNADDRESS LITTLE instruction, 135 ASI AS IF USER PRIMARY LITTLE, 384, 402, 
alignment 411, 446 
data (load /store), 25, 102, 381 ASI AS IF USER SECONDARY, 384, 402, 410, 446 
doubleword, 25, 102, 381 ASI AS IF USER SECONDARY LITTLE, 384, 
extended-word, 102 402, 411, 446 
halfword, 25, 102, 381 ASI AS IF USER, SECONDARY NOFAULT LITT 
instructions, 25, 102, 381 LE, 384 
integer registers, 251, 253 ASI BLK, AIUP, 402, 410 
memory, 381, 448 ASI BLK  AIUPL, 402,411 
quadword, 25, 102, 381 ASI BLK AIUS, 402,410 
word, 25, 102, 381 ASI BLK AIUSL, 402,411 
ALLCLEAN instruction, 136 ASI BLK P, 407 
alternate space instructions, 27, 71 ASI BLK PL, 407 
ancillary state registers (ASRs) ASI BLK S, 407 
access, 67 ASI BLK SL, 408 
assembly language syntax, 496 ASI BLOCK AS IF. USER PRIMARY, 402,410 
I/O register access, 27 ASI BLOCK AS IF USER PRIMARY LITTLE, 4 
possible registers included, 288, 360 02, 411 
privileged, 29, 482 ASI BLOCK AS IF USER SECONDARY, 402,410 
reading/writing implementation-dependent ASI BLOCK AS IF USER SECONDARY LITTLE, 
processor registers, 29, 482 402, 411 
writing to, 359 ASI BLOCK PRIMARY, 407 
AND instruction, 137 ASI BLOCK PRIMARY LITTLE, 407 
ANDcc instruction, 137 ASI BLOCK SECONDARY, 407 
ANDN instruction, 137 ASI BLOCK SECONDARY LITTLE, 408 
ANDNcc instruction, 137 ASI_FL16_P, 406 
annul bit ASI_FL16_PL, 407 
in branch instructions, 148 ASI_FL16_PRIMARY, 406 
in conditional branches, 163 ASI FL16 PRIMARY LITTLE, 407 
annulled branches, 148 ASI FL16, S, 406 
application program, 7, 67 ASI FL16 SECONDARY, 406 
architectural direction note, 5 ASI FL16 SECONDARY LITTLE, 407 
architecture, meaning for SPARC V9, 19 ASI FL16. SL, 407 
arithmetic overflow, 70 ASI FLS8 P, 406 
ARRAY16 instruction, 138 ASI_FL8_PL, 406 
ARRAY32 instruction, 138 ASI FL8, PRIMARY, 406 
ARRAYS instruction, 138 ASI FL8, PRIMARY LITTLE, 406 
ASI, 7 ASI FL8,. S, 406 
invalid, and data access exception, 446 ASI FL8, SECONDARY, 406 
ASI register, 67 ASI FL8, SECONDARY LITTLE, 406 
ASI, See address space identifier (ASI) ASI FL8 SL, 406 
ASI, AIUP, 402, 410 ASI MMU CONTEXTID, 403 
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ASI_N, 401 











































































































ASI_NL, 402 

ASI_NUCLEUS, 108, 401 

ASI_NUCLEUS_LITTLE, 108, 402 

ASI_NUCLEUS_QUAD_LDD (deprecated), 418 

ASI_NUCLEUS_QUAD_LDD_L (deprecated), 418 

ASI_NUCLEUS_QUAD_LDD_LITTLE 
(deprecated), 418 

ASI_P, 405 

ASI_PHY_BYPASS_EC_WITH_EBIT_L, 418 

ASI_PHYS_BYPASS_EC_WITH_EBIT, 418 

ASI_PHYS_BYPASS_EC_WITH_EBIT_LITTLE 
18 

ASI_PHYS_USE_EC, 418 

ASI_PHYS_USE_EC_L, 418 

ASI_PHYS_USE_EC_LITTLE, 418 

ASI_PL, 405 

ASI_PNF, 405 

ASI_PNFL, 405 





ASI_PRIMARY, 108, 383, 384, 405 
ASI_PRIMARY_LITTLE, 108, 383, 405 
ASI_PRIMARY_NO_FAULT, 380, 397, 405, 446 
ASI_PRIMARY_NO_FAULT_LITTLE, 380, 397, 
405, 446 
ASI_PRIMARY_NOFAULT_LIT 


























E, 384 




















ASI_PST16_P, 329, 405 
ASI_PST16_PL, 329, 406 
ASI_PST16_PRIMARY, 405 
ASI_PST16_PRIMARY_LITTLE, 406 
ASI_PST16_S, 329, 405 
ASI_PST16_SECONDARY, 405 
ASI_PST16_SECONDARY_LITTLE, 406 
ASI_PST16_SL, 329 





ASI_PST32_P, 329, 406 
ASI_PST32_PL, 329, 406 
ASI_PST32_PRIMARY, 406 
ASI_PST32_PRIMARY_LITTLE, 406 
ASI_PST32_S, 329, 406 
ASI_PST32_SECONDARY, 406 
ASI_PST32_SECONDARY_LITTLE, 406 
ASI_PST32_SL, 329, 406 
ASI_PST8_P, 405 

ASI_PST8_PL, 406 
ASI_PST8_PRIMARY, 405 
ASI_PST8_PRIMARY_LITTLE, 406 
ASI_PST8_S, 405 
ASI_PST8_SECONDARY, 405 
ASI_PST8_SECONDARY_LITTLE, 406 
ASI_PST8_SL, 329, 406 































































































d), 404 


ASI_QUAD_LDD_REAL (deprecate 
ASI_QUAD_LDD_RE 

ASI_REAL, 402, 411, 418 

ASI REAL IO, 402, 411, 418 
ASI_REAL_IO_L, 402 
ASI_REAL_IO_LITTLE, 402, 412, 418 
ASI_REAL_L, 402 
ASI_REAL_LITTLE, 402, 412, 418 
ASI_S, 405 


ASI_SECONDARY, 405 

ASI_SECONDARY_LITTLE, 405 
ASI SECONDARY NO FAULT, 397, 405, 446 
TTLE, 397,405, 


ASI SECONDARY NO FAUL 


446 





ASI SECONDARY NOFAUL 


ASI SL, 405 
ASI S 





F, 405 


ASI SNFL, 405 
ASI TWINX AIUP, 255, 403, 413 
ASI TWINX AIUP. L, 255,413 
ASI TWINX AIUPL, 404 

ASI TWINX AIUS, 255,413 

ASI TWINX AIUS. L, 404, 413 
USER PRIMARY, 403, 413 








TTD 














T, 384 








USER PRIMARY LITTLI 








AL LITTLE (deprecated), 404 





E, 4 


USER SECONDARY, 408, 413 

















USER SECONDARY LITTLE, 








ASI TWINX AS IF 
ASI TWINX AS IF 
04, 413 
ASI TWINX AS IF 
ASI TWINX AS IF 
404, 413 


ASI TWINX  N, 255, 404, 418 
ASI TWINX NL, 255, 405, 413, 418 
ASI_TWINX_NUCLEUS, 404, 413, 418 


ASI_TWINX_NUCLE 
ASI_TWINX_NUCLE 
ASI_TWINX_P, 





US[_L], 381 








US_LITTLE, 405, 413, 418 











ASI_TWINX_PL, 25 
ASI_TWINX_PRIMARY, 407, 415 


ASI_TWINX_PRIMARY_LITTLE, 


255, 407 


5, 407 





ASI_TWINX_R, 404, 414 

















ASI_TWINX_REAL, 255, 404, 414 
ASI TWINX REAL[. L], 381 

ASI TWINX REAL . L, 404,414 
ASI TWINX REAL LITTLE, 404, 


ASI TWINX S, 255,407 








ASI - 


WINX SECON 





ASI - 








WINX SECON 





ASI TWINX SL, 25 


ASR, 7 
üsr reg, 496 


DARY, 407,41 








407, 415 


414 
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DARY 
5, 407 


LITTLE, 407, 415 


Index 


atomic 
memory operations, 256, 392, 393 
store doubleword instruction, 334, 336 
store instructions, 313, 314 
atomic load-store instructions 
compare and swap, 151 
load-store unsigned byte, 247, 343 
load-store unsigned byte to alternate space, 248 
simultaneously addressing doublewords, 342 
swap R register with alternate space 
memory, 343 
swap R register with memory, 151, 342 
atomicity, 380, 488 


B 

BA instruction, 142, 143, 475 
BCC instruction, 142, 475 

bclrg synthetic instruction, 504 
BCS instruction, 142, 475 

BE instruction, 142, 475 

Berkeley RISCs, 21 

BG instruction, 142, 475 

BGE instruction, 142, 475 

BGU instruction, 142, 475 

Bicc instructions, 142, 469 
big-endian, 7 

big-endian byte order, 26, 90, 103 
binary compatibility, 22 

BL instruction, 475 

BLD, 7 

BLD, See LDBLOCKF instruction 
BLE instruction, 142, 475 

BLEU instruction, 142, 475 

block load instructions, 53, 232, 415 
block store instructions, 53, 317, 415 
blocked byte formatting, 139 
BMASK instruction, 144 

BN instruction, 142, 475 

BNE instruction, 142, 475 

BNEG instruction, 142, 475 

BP instructions, 475 

BPA instruction, 145, 475 

BPCC instruction, 145, 475 

BPcc instructions, 70, 71, 145, 476 
BPCS instruction, 145, 475 

BPE instruction, 145, 475 

BPG instruction, 145, 475 

BPGE instruction, 145, 475 


BPGU instruction, 145, 475 
BPL instruction, 145, 475 
BPLE instruction, 145, 475 
BPLEU instruction, 145, 475 
BPN instruction, 145, 475 
BPNE instruction, 145, 475 
BPNEG instruction, 145, 475 
BPOS instruction, 142, 475 
BPPOS instruction, 145, 475 
BPr instructions, 148, 475 
BPVC instruction, 145, 475 
BPVS instruction, 145, 475 
branch 
annulled, 148 
delayed, 99 
elimination, 115, 116 
fcc-conditional, 163, 165 
icc-conditional, 143 
instructions 
on floating-point condition codes, 162 
on floating-point condition codes with 
prediction, 164 
on integer condition codes with prediction 
(BPcc), 145 
on integer condition codes, See Bicc instruc- 
tions 
when contents of integer register match 
condition, 148 
prediction bit, 148 
unconditional, 142, 146, 162, 165 
with prediction, 20 
BRGEZ instruction, 148 
BRGZ instruction, 148 
BRLEZ instruction, 148 
BRLZ instruction, 148 
BRNZ instruction, 148 
BRZ instruction, 148 
bset synthetic instruction, 504 
BSHUFFLE instruction, 144 
BST, 7 
BST, See STBLOCKF instruction 
btog synthetic instruction, 504 
btst synthetic instruction, 504 
BVC instruction, 142, 475 
BVS instruction, 142, 475 
byte, 7 
addressing, 108 
data format, 33 
order, 26 


Index 


order, big-endian, 26 

order, little-endian, 26 
byte order 

big-endian, 90 

implicit, 91 

in trap handlers, 431 

little-endian, 90 


C 


cache 
coherency protocol, 379 
data, 387 
instruction, 387 
miss, 286 
nonconsistent instruction cache, 387 
cacheable accesses, 378 
caching, TSB, 466 
CALL instruction 
description, 150 
displacement, 28 
does not change CWP, 50 
and JMPL instruction, 226 
writing address into R[15], 52 
call synthetic instruction, 502 
CANRESTORE (restorable windows) register, 83 
and clean window exception, 117 
and CLEANWIN register, 84, 85, 451 
counting windows, 85 
decremented by RESTORE instruction, 292 
decremented by SAVED instruction, 302 
detecting window underflow, 50 
if registered window was spilled, 293 
incremented by SAVE instruction, 300 
modified by NORMALW instruction, 274 
modified by OTHERW instruction, 276 
range of values, 82, 489 
RESTORE instruction, 117 
specification for RDPR instruction, 290 
specification for WRPR instruction, 361 
window underflow, 451 
CANSAVE (savable windows) register, 83 
decremented by SAVE instruction, 300 
detecting window overflow, 50 
FLUSHW instruction, 177 
if equals zero, 117 
incremented by RESTORE, 292 
incremented by SAVED instruction, 302 
range of values, 82, 489 


SAVE instruction, 452 
specification for RDPR instruction, 290 
specification for WRPR instruction, 361 
window overflow, 450 
CAS synthetic instruction, 393 
CASA instruction, 151 
32-bit compare-and-swap, 392 
alternate space addressing, 26 
and data access exception (noncacheable page) 
exception, 446 
atomic operation, 247 
hardware primitives for mutual exclusion of 
CASXA, 391 
in multiprocessor system, 248, 342, 343 
R register use, 101 
word access (memory), 102 
casn synthetic instructions, 503 
CASX synthetic instruction, 392, 393 
CASXA instruction, 151 
64-bit compare-and-swap, 392 
alternate space addressing, 26 
and data access exception (noncacheable page) 
exception, 446 
atomic operation, 248 
doubleword access (memory), 102 
hardware primitives for mutual exclusion of 
CASA, 391 
in multiprocessor system, 247, 248, 342, 343 
R register use, 101 
catastrophic error exception, 424 
CCO instruction field 
branch instructions, 145, 165 
floating point compare instructions, 169 
move instructions, 266, 476 
CC1 instruction field 
branch instructions, 145, 165 
floating point compare instructions, 169 
move instructions, 266, 476 
CC2 instruction field 
move instructions, 266, 476 
CCR (condition codes register), 7 
CCR (condition codes) register, 69 
32-bit operation (icc) bit of condition field, 70, 71 
64-bit operation (xcc) bit of condition field, 70, 
71 
ADD instructions, 134 
ASR for, 67 
carry (c) bit of condition fields, 70 
icc field, See CCR.icc field 
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MULScc instruction, 270 
negative (n) bit of condition fields, 70 
overflow bit (V) in condition fields, 70 
restored by RETRY instruction, 154, 296 
saved after trap, 423 
saving after trap, 30 
TSTATE register, 88 
write instructions, 359 
xcc field, See CCR.xcc field 
zero (Z) bit of condition fields, 70 
CCR.icc field 
add instructions, 134, 345 
bit setting for signed division, 305 
bit setting for signed/unsigned multiply, 311, 
356 
bit setting for unsigned division, 355 
branch instructions, 143, 146, 266 
integer subtraction instructions, 341 
logical operation instructions, 137, 275, 363 
MULScc instruction, 270 
Tcc instruction, 349 
CCR.xcc field 
add instructions, 134, 345 
bit setting for signed/unsigned divide, 305, 355 
bit setting for signed/unsigned multiply, 311, 
356 
branch instructions, 146, 266 
logical operation instructions, 137, 275, 363 
subtract instructions, 341 
Tcc instruction, 349 
clean register window, 300, 445 
clean window, 7 
and window traps, 86, 450 
CLEANWIN register, 85 
definition, 451 
number is zero, 117 
trap handling, 452 
clean window exception, 83, 117, 301, 445, 451, 484 
CLEANWIN (clean windows) register, 83 
CANSAVE instruction, 117 
clean window counting, 84 
incremented by trap handler, 452 
range of values, 82, 489 
specification for RDPR instruction, 290 
specification for WRPR instruction, 361 
specifying number of available clean 
windows, 451 
value calculation, 85 
clock cycle, counts for virtual processor, 72 


clock tick registers, See TICK and STICK registers 
clock-tick register (TICK), 449 
clrn synthetic instructions, 504 
cmp synthetic instruction, 341, 502 
code 
self-modifying, 393 
coherence, 8 
between processors, 488 
data cache, 387 
domain, 379 
memory, 380 
unit, memory, 381 
compare and swap instructions, 151 
comparison instruction, 110, 341 
compatibility note, 5 
completed (memory operation), 8 
compliant SPARC V9 implementation, 23 
cond instruction field 
branch instructions, 143, 145, 163, 165 
floating point move instructions, 180 
move instructions, 266 
condition codes 
adding, 345 
effect of compare-and-swap instructions, 152 
extended integer (xcc), 71 
floating-point, 163 
icc field, 70 
integer, 69 
results of integer operation (icc), 71 
subtracting, 341, 351 
trapping on, 349 
xcc field, 70 
condition codes register, See CCR register 
conditional branches, 143, 163, 165 
conditional move instructions, 29 
conforming SPARC V9 implementation, 23 
consistency 
between instruction and data spaces, 393 
processor, 387, 390 
processor self-consistency, 389 
sequential, 380, 388, 389 
strong, 389 
const22 instruction field of ILLTRAP 
instruction, 222 
constants, generating, 306 
context, 8 
nucleus, 176 
context identifier, 382 
control transfer 
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pseudo-control-transfer via WRPR to 
PSTATE.am, 93 

control-transfer instructions, 28 
control-transfer instructions (CTIs), 28, 154, 296 
conventions 

font, 2 

notational, 3 
conversion 


between floating-point formats instructions, 218 


floating-point to integer instructions, 216, 367 
integer to floating-point instructions, 173, 221 
planar to packed, 206 
copyback, 8 
CPI, 8 
CPU, pipeline draining, 82, 86 
cpu_mondo exception, 445 
cross-call, 8 
CTI, 8,15 
current exception (cexc) field of FSR register, 64, 
119, 482 
current window, 8 
current window pointer register, See CWP register 
current_little_endian (cle) field of PSTATE 
register, 90, 383 
CWP (current window pointer) register 
and instructions 
CALL and JMPL instructions, 50 
FLUSHW instruction, 177 
RDPR instruction, 290 
RESTORE instruction, 117, 292 
SAVE instruction, 116, 292, 300 
WRPR instruction, 361 
and traps 
after spill trap, 452 
after spill/fill trap, 30 
on window trap, 452 
saved by hardware, 423 
CWP (current window pointer) register, 82 
clean windows, 84 
definition, 8 
incremented/decremented, 49, 292, 300 
overlapping windows, 49 
range of values, 82, 489 
restored during RETRY, 154, 296 
specifying windows for use without 
cleaning, 451 
and TSTATE register, 88 


D 
D superscript on instruction name, 124 
d16hi instruction field 
branch instructions, 148 
d16lo instruction field 
branch instructions, 148 
data 
access, 8 
cache coherence, 387 
conversion between SIMD formats, 41 
flow order constraints 
memory reference instructions, 386 
register reference instructions, 385 
formats 
byte, 33 
doubleword, 33 
halfword, 33 
Int16 SIMD, 42 
Int32 SIMD, 42 
quadword, 33 
tagged word, 33 
Uint8 SIMD, 42 
word, 33 
memory, 395 
types 
floating-point, 33 
signed integer, 33 
unsigned integer, 33 
width, 33 
Data Cache Unit Control register, See DCUCR 
data_access_exception (invalid ASI) exception 
with load alternate instructions, 230 
data_access_exception exception, 445 
with compare-and-swap instructions, 153 
with LD instructions, 228 
with LDSHORTT instructions, 231, 234 
with LDTXA instructions, 257 
with load instructions, 237, 251, 254, 259 
with load instructions and ASls, 241, 413, 414, 
415, 416, 417 
with store instructions and ASIs, 241, 413, 414, 
415, 416, 417 
with STPARTIALF instructions, 331 
with SWAPA instruction, 344 
DCTI couple, 115 
DCTI instructions, 8 
behavior, 99 
RETURN instruction effects, 298 
dec synthetic instructions, 504 
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deccc synthetic instructions, 504 
deferred trap, 427 
distinguishing from disrupting trap, 429 
floating-point, 291 
restartable 
implementation dependency, 428 
software actions, 428 
delay instruction 
and annul field of branch instruction, 163 
annulling, 28 
conditional branches, 165 
DONE instruction, 154 
executed after branch taken, 148 
following delayed control transfer, 28 
RETRY instruction, 296 
RETURN instruction, 298 
unconditional branches, 165 
with conditional branch, 146 
delayed branch, 99 
delayed control transfer, 148 
delayed CTI, See DCTI 
denormalized number, 8 
deprecated, 8 
deprecated exceptions 
tag overflow, 449 
deprecated instructions 
FBA, 162 
FBE, 162 
FBG, 162 
FBGE, 162 
FBL, 162 
FBLE, 162 
FBLG, 162 
FBN, 162 
FBNE, 162 
FBO, 162 
FBU, 162 
FBUE, 162 
FBUGE, 162 
FBUL, 162 
FBULE, 162 
LDFSR, 243 
LDTW, 250 
LDTWA, 252 
MULScc, 69, 270 
RDY, 67, 69, 287 
SDIV, 69, 304 
SDIVcc, 69, 304 
SMUL, 69,311 





SMULcc, 69, 311 
STFSR, 327 
STTW, 334 
STTWA, 336 
SWAP, 342 
SWAPA, 343 
TADDccTV, 346 
TSUBccTV, 352 
UDIV, 69, 354 
UDIVcc, 69, 354 
UMUL, 69, 356 
UMULcc, 69, 356 
WRY, 67, 69, 358 
dev mondo exception, 446 
disp19 instruction field 
branch instructions, 145, 165 
disp22 instruction field 
branch instructions, 142, 163 
disp30 instruction field 
word displacement (CALL), 150 
disrupting trap, 429 
divide instructions, 28, 272, 304, 354 
division by zero exception, 111, 272, 447 
division-by-zero bits of FSR.aexc/FSR.cexc 
fields, 66 
DONE instruction, 154 
effect on TNPC register, 88 
effect on TSTATE register, 89 
generating illegal instruction exception, 448 
modifying CCR.xcc condition codes, 70 
return from trap, 423 
return from trap handler with different GL 
value, 97 
target address, 28 
doubleword, 8 
addressing, 106 
alignment, 25, 102, 381 
data format, 33 
definition, 8 


E 

EDGE16 instruction, 156 
EDGE16L instruction, 156 
EDGEI6LN instruction, 158 
EDGEIO6N instruction, 158 
EDGE32 instruction, 156 
EDGE32L instruction, 156 
EDGE32LN instruction, 158 
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EDGE32N instruction, 158 
EDGES instruction, 156 
EDGESL instruction, 156 
EDGE8LN instruction, 158 
EDGESN instruction, 158 
emulating multiple unsigned condition codes, 116 
enable floating-point 
See FPRS register, fef field 
See PSTATE register, pef field 
even parity, 8 
exception, 9 
exceptions 
See also individual exceptions 
catastrophic error, 424 
causing traps, 423 
clean window, 445, 484 
cpu mondo, 445 
data access exception, 445 
definition, 424 
dev mondo, 446 
division by zero, 447 
fill n normal, 447 
fill n other, 447 
fp disabled 
and GSR, 76 
fp disabled, 447 
fp exception ieee 754, 447 
fp exception other, 447 
htrap instruction, 447 
illegal instruction, 447 
instruction access exception, 448, 448 
interrupt level 14 
and SOFTINT.int_ level, 78 
and STICK CMPR.stick cmpr, 81 
and TICK. CMPR.tick cmpr, 80 
interrupt level 14, 448 
interrupt level 15 
and SOFTINT.int level, 78 
interrupt level n 
and SOFTINT register, 77 
and SOFTINT.int level, 78 
interrupt level n, 430, 448 
LDDF mem adaress not aligned, 448 
LDQF mem adaress not aligned, 450 
mem address not aligned, 448 
nonresumable error, 448 
pending, 30 
privileged action, 448 
privileged opcode 








and access to register-window PR state 
registers, 82, 86, 95, 97 
and access to SOFTINT, 77 
and access to SOFTINT CLR, 79 
and access to SOFTINT SET, 78 
and access to STICK CMPR, 81 
and access to TICK CMPR, 79 
privileged opcode, 449 
resumable error, 449 
Spill n normal, 301, 449 
Spill n other, 301, 449 
STDF mem adaress not aligned, 449 
STQF mem adaress not aligned, 450 
lag overflow (deprecated), 449 
trap instruction, 449 
unimplemented LDTW, 449 
unimplemented STTW, 449 
VA watchpoint, 449 
execute unit, 385 
execute state 
trap processing, 443 
explicit ASI, 9, 108, 401 
extended word, 9 
addressing, 106 








F 

F registers, 9, 24, 119, 365, 432 
FABSd instruction, 159, 473, 474 
FABSq instruction, 159, 473, 474 
FABSs instruction, 159 

FADD, 160 

FADDd instruction, 160 
FADDJ instruction, 160 

FADDS instruction, 160 
FALIGNDATA instruction, 161 
FAND instruction, 214 
FANDNOT1 instruction, 214 
FANDNOT!IS instruction, 214 
FANDNOT2 instruction, 214 
FANDNOT2S instruction, 214 
FANDS instruction, 214 

FBA instruction, 162, 163, 475 
FBE instruction, 162, 475 

FBfcc instructions, 58, 162, 447, 469, 475 
FBG instruction, 162, 475 

FBGE instruction, 162, 475 

FBL instruction, 162, 475 

FBLE instruction, 162, 475 


Index 


FBLG instruction, 162, 475 

FBN instruction, 162, 475 

FBNE instruction, 162, 475 

FBO instruction, 162, 475 

FBPA instruction, 164, 165, 475 
FBPE instruction, 164, 475 
FBPfcc instructions, 58, 164, 469, 475, 476 
FBPG instruction, 164, 475 
FBPGE instruction, 164, 475 
FBPL instruction, 164, 475 
FBPLE instruction, 164, 475 
FBPLG instruction, 164, 475 
FBPN instruction, 164, 165, 475 
FBPNE instruction, 164, 475 
FBPO instruction, 164, 475 

FBPU instruction, 164, 475 
FBPUE instruction, 164, 475 
FBPUG instruction, 164, 475 
FBPUGE instruction, 164, 475 
FBPUL instruction, 164, 475 
FBPULE instruction, 164, 475 
FBU instruction, 162, 475 

FBUE instruction, 162, 475 
FBUG instruction, 162, 475 
FBUGE instruction, 162, 475 
FBUL instruction, 162, 475 
FBULE instruction, 162, 475 
fcc-conditional branches, 163, 165 
fccn, 9 

FCMP instructions, 476 

FCMP* instructions, 58, 59, 169 
FCMPd instruction, 169, 474 
FCMPE instructions, 476 
FCMPE* instructions, 58, 59, 169 
FCMPEd instruction, 169, 474 
FCMPEq instruction, 169, 474 
FCMPEQ16 instruction, 166 
FCMPEQ32 instruction, 166 
FCMPEs instruction, 169, 474 
FCMPGT instruction, 166 
FCMPGT16 instruction, 166 
FCMPGT32 instruction, 166 
FCMPLE16 instruction, 166 
FCMPLE16 instruction, 166 
FCMPLE32 instruction, 166 
FCMPLE32 instruction, 166 
FCMPNE16 instruction, 166, 167 
FCMPNE32 instruction, 166, 167 
FCMPgq instruction, 169, 474 








FCMPs instruction, 169, 474 

fcn instruction field 
DONE instruction, 154 
PREFETCH, 280 
RETRY instruction, 296 

FDIVd instruction, 171 

FDIVq instruction, 171 

FDIVs instructions, 171 

FdMUI instruction, 194 

FdTOi instruction, 216, 367 

FdTOgq instruction, 218 

FdTOs instruction, 218 

FdTOx instruction, 216, 474 

fef field of FPRS register, 73 
and access to GSR, 76 
and fp_disabled exception, 447 
branch operations, 163, 165 
byte permutation, 144 
comparison operations, 167, 170 
data movement operations, 267 
enabling FPU, 92 


floating-point operations, 159, 160, 171, 173, 178, 
183, 186, 194, 196, 215, 216, 218, 220, 221, 236, 


239, 243, 245, 258 
integer arithmetic operations, 205, 210 
logical operations, 211, 212, 214 
memory operations, 234 
read operations, 289, 308, 319 


special addressing operations, 135, 161, 321, 327, 


331, 333, 339, 360 
fef, See FPRS register, fef field 
FEXPAND instruction, 172 
FEXPAND operation, 172 
fill handler, 293 
fill register window, 447 
overflow /underflow, 50 
RESTORE instruction, 85, 292, 451 
RESTORED instruction, 118, 294, 452 
RETRY instruction, 452 
selection of, 451 
trap handling, 451, 452 
trap vectors, 293 
window state, 85 
fill n normal exception, 293, 299, 447, 447 
fill n other exception, 293, 299, 447 
FiTOd instruction, 173 
FiTOq instruction, 173 
FiTOs instruction, 173 
fixed values, 223 
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FLUSH instruction, 174, 395 
data access, 8 


fixed-point scaling, 189 
floating point 


absolute value instructions, 159 
add instructions, 160 
compare instructions, 58, 59, 169, 169 
condition code bits, 163 
condition codes (fcc) fields of FSR register, 61, 
163, 165, 169 
data type, 33 
deferred-trap queue (FQ), 291 
divide instructions, 171 
exception, 9 
exception, encoding type, 60 
FPRS register, 359 
FSR condition codes, 59 
move instructions, 178 
multiply instructions, 194 
negate instructions, 196 
operate (FPop) instructions, 9, 29, 60, 64, 119, 243 
registers 
destination F, 365 
FPRS, See FPRS register 
FSR, See FSR register 
programming, 56 
rounding direction, 59 
square root instructions, 215 
subtract instructions, 220 
trap types, 9 
TEEE_754_exception, 61, 62, 64, 67, 365, 366 
invalid_fp_register, 159, 160 
unfinished_FPop, 61, 62, 67, 160, 171, 195, 
219, 220, 366 
results after recovery, 62 
unimplemented_FPop, 62, 67, 159, 160, 170, 
171, 173, 178, 184, 187, 195, 196, 217, 219, 
220, 366 
traps 
deferred, 291 
precise, 291 


floating-point condition codes (fcc) fields of FSR 


register, 432 


immediacy of effect, 176 
in multiprocessor system, 174 
in self-modifying code, 175 
latency, 488 
flush instruction memory, See FLUSH instruction 
flush register windows instruction, 177 
FLUSHW instruction, 177, 449 
effect, 30 
management by window traps, 86, 450 
spill exception, 118, 177, 452 
FMOVcc instructions 
conditionally moving floating-point register 
contents, 71 
conditions for copying floating-point register 
contents, 115 
copying a register, 58 
encoding of opf<84> bits, 474 
encoding of opf_cc instruction field, 476 
encoding of rcond instruction field, 475 
floating-point moves, 180 
FPop instruction, 119 
used to avoid branches, 184, 266 
MOVccd instruction, 474 
MOVccq instruction, 474 
MOVd instruction, 178, 473, 474 
MOVDfcc instructions, 180 
MOVdGEZ instruction, 185 
MOVdGZ instruction, 185 
MOVDicc instructions, 180 
MOVdLEZ instruction, 185 
MOVdLZ instruction, 185 
MOVANZ instruction, 185 
MOVdZ instruction, 185 
MOVq instruction, 178, 473, 474 


MOVqGEZ instruction, 185 
MOVqGZ instruction, 185 
MOVQicc instructions, 180, 183 
MOVqLEZ instruction, 185 
MOVQqLZ instruction, 185 


floating-point operate (FPop) instructions, 447 
floating-point trap types 

IEEE 754 exception, 432, 447 
floating-point unit (FPU), 9, 24 
FLUSH instruction, 175 

memory ordering control, 262 
FLUSH instruction 

memory /instruction synchronization, 174 


MOVQqNZ instruction, 185 
MOVqZ instruction, 185 
MOVr instructions, 119, 475 
MOVRa instructions, 186 
MOVRSsGZ instruction, 185 
MOVRSLEZ instruction, 185 


F 
F 
F 
F 
F 
F 
F 
F 
F 
F 
F 
F 
FMOVQfcc instructions, 180, 183 
F 
F 
F 
F 
F 
F 
F 
F 
F 
F 
F 
FMOVRsLZ instruction, 185 
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F 
F 
F 


MOVRSsNZ instruction, 185 
MOVRsZ instruction, 185 

MOVs instruction, 178 

MOVScc instructions, 182 
MOVSfcc instructions, 180 
MOVSGEZ instruction, 185 
MOVSicc instructions, 180 
MOVSxcc instructions, 180 

MOV Xxcc instructions, 180, 183 
L8SUx16 instruction, 188, 191 
L8ULx16 instruction, 188, 191 
L8x16 instruction, 188, 189 
L8x16AL instruction, 188, 190 
L8x16AU instruction, 188, 190 
Ld instruction, 194 
LD8SUx16 instruction, 188, 192 
LD8ULx16 instruction, 188, 193 
Lq instruction, 194 

Ls instruction, 194 

NAND instruction, 214 

NANDS instruction, 214 

NEG instructions, 196 

NEGd instruction, 196, 473, 474 
NEGqQ instruction, 196, 473, 474 
NEGs instruction, 196 

NOR instruction, 214 

NORS instruction, 214 

NOTI instruction, 212 

NOTIS instruction, 212 

NOT? instruction, 212 

NOT2S instruction, 212 

ONE instruction, 211 

ONES instruction, 211 

OR instruction, 214 


CC CC. Gee eG 





S556 5555 55 


C 








formats, instruction, 100 


F 
F 
F 
F 
F 


ORNOT! instruction, 214 
ORNOTIS instruction, 214 
ORNOT? instruction, 214 
ORNOT2S instruction, 214 
ORS instruction, 214 


fo disabled exception, 447 


absolute value instructions, 159, 160, 220 

and GSR, 76 

FPop instructions, 119 

FPRS.fef disabled, 73 

PSTATE.pef not set, 73, 74 

with branch instructions, 163, 165 

with compare instructions, 168 

with conversion instructions, 173, 217, 219, 221 


with floating-point arithmetic instructions, 171, 
195, 205, 210 

with FMOV instructions, 178 

with load instructions, 241 

with move instructions, 184, 187, 267 

with negate instructions, 196 

with store instructions, 321, 322, 325, 327, 328, 
331, 333, 339, 340, 360 


fp_exception exception, 64 
fp_exception_ieee_754 "invalid" exception, 216 
fp exception ieee 754 exception, 447 


and tem bit of FSR, 60 

cause encoded in FSR.ftt, 61 

FSR.aexc, 64 

FSR.cexc, 65 

FSR.ftt, 64 

generated by FCMP or FCMPE, 59 

and IEEE 754 overflow / underflow 
conditions, 64, 65 

trap handler, 366 

when FSR.tem - 0, 432 

when FSR.tem -1, 432 

with floating-point arithmetic instructions, 160, 
171, 195, 220 


fp exception other exception, 67, 447 


absolute value instructions, 159 

cause encoded in FSR.ftt, 61 

FADDJ instruction, 160 

FCMP{E}q instructions, 170 

FDIVq instruction, 171 

FdTOq, FqTOd instructions, 219 

FiTOq instruction, 173 

FMOVcc instruction, 184 

FMOVgq instruction, 178 

FMOVRa instruction, 187 

FMULQq, FdMULg instructions, 195 

FNEGgq instruction, 196 

FqTOx, FqTOi instructions, 217 

FSORT instructions, 215 

FSUBg instruction, 220 

FxTOq instruction, 221 

incorrect IEEE Std 754-1985 result, 119, 481 

occurrence, 133 

supervisor handling, 366 

trap type of unfinished FPop, 62 

unimplemented_FPop for quad FPops, 57 

when quad FPop unimplemented in 
hardware, 63 

with floating-point arithmetic instructions, 171, 
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195 
FPACK instruction, 77 
FPACK instructions, 197-201 
FPACK16 instruction, 197, 198 
FPACK16 operation, 198 
FPACK32 instruction, 197, 199 
FPACK32 operation, 199 
FPACKFIX instruction, 197, 201 
FPACKFIX operation, 201 
FPADD16 instruction, 203 
FPADD!16S instruction, 203 
FPADD32 instruction, 203 
FPADD32S instruction, 203 
FPMERGE instruction, 206 
FPop, 9 
FPop instruction 
unimplemented, 447 
FPop, See floating-point operate (FPop) instructions 
FPRS register 
See also floating-point registers state (FPRS) 
register 
FPRS register, 73 
ASR summary, 68 
definition, 9 
fef field, 119, 431 
RDFPRS instruction, 288 
FPRS register fields 
dl (dirty lower fp registers), 74 
du (dirty upper fp registers, 74 
fef, 73 
fef, See also fef field of FPRS register 
FPSUB16 instruction, 208 
FPSUB16S instruction, 208 
FPSUB322 instruction, 208 
FPSUB32S instruction, 208 
FPU, 9 
FqTOd instruction, 218 
FqTOi instruction, 216, 367 
FqTOs instruction, 218 
FqTOx instruction, 216, 473, 474 
freg, 496 
FsMULd instruction, 194 
FSORTd instruction, 215 
FSQRTq instruction, 215 
FSORTS instruction, 215 
FSR (floating-point state) register 
fields 
aexc (accrued exception), 61, 62, 63, 64, 365 
aexc (accrued exceptions) 


in user-mode trap handler, 366 
-- dza (division by zero) bit of aexc, 66 
-- nxa (rounding) bit of aexc, 67 
cexc (current exception), 59, 61, 62, 64, 64, 65, 
365, 447 
cexc (current exceptions) 
in user-mode trap handler, 366 
-- dzc (division by zero) bit of cexc, 66 
-- nxc (rounding) bit of cexc, 67 
fcc (condition codes), 58, 61, 62, 366, 497 
fccn, 59 
ftt (floating-point trap type), 58, 60, 64, 119, 
258, 327, 339, 447 
in user-mode trap handler, 366 
not modified by LDFSR/LDXFSR 
instructions, 58 
ns (nonstandard mode), 58, 243, 258 
qne (queue not empty), 58, 63, 243, 258 
in user-mode trap handler, 366 
rd (rounding), 59 
tem (trap enable mask), 59, 63, 65, 367, 368, 
447 
ver, 60 
ver (version), 58, 258 
FSR (floating-point state) register, 58 
after floating-point trap, 365 
compliance with IEEE Std 754-1985, 67 
LDFSR instruction, 243 
reading / writing, 58 
values in ftt field, 61 
writing to memory, 327, 339 
FSRC1 instruction, 212 
FSRCIS instruction, 212 
FSRC2 instruction, 212 
FSRC2S instruction, 212 
FsTOd instruction, 218 
FsTOi instruction, 216, 367 
FsTOq instruction, 218 
FsTOXx instruction, 216, 473, 474 
FSUBd instruction, 220 
FSUBg instruction, 220 
FSUBs instruction, 220 
functional choice, implementation-dependent, 481 
FXNOR instruction, 214 
FXNORS instruction, 214 
FXOR instruction, 214 
FXORS instruction, 214 
FxTOd instruction, 221, 474 
FxTOq instruction, 221, 474 
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FxTOs instruction, 221, 474 
FZERO instruction, 211 
FZEROS instruction, 211 


G 


general status register, See GSR (general status) 
register 
generating constants, 306 
GL register, 96 
access, 97 
during trap processing, 443 
function, 96 
reading with RDPR instruction, 290, 361 
relationship to TL, 97 
restored during RETRY, 154, 296 
SPARC V9 compatibility, 94 
and TSTATE register, 88 
value restored from TSTATE[TL], 97 
writing to, 97 
global level register, See GL register 
global registers, 20, 24, 46, 48, 48, 481 
graphics status register, See GSR (general status) 
register 
GSR (general status) register 
fields 
align, 77 
im (interval mode) field, 77 
irnd (rounding), 77 
mask, 77 
scale, 77 
GSR (general status) register 
ASR summary, 68 


H 
halfword, 9 

alignment, 25, 102, 381 

data format, 33 
hardware 

dependency, 480 

traps, 434 
hardware trap stack, 30 
htrap_instruction exception, 350, 447 
hyperprivileged, 10 


i (integer) instruction field 


arithmetic instructions, 270, 272, 275, 304, 311, 
354, 356 
floating point load instructions, 236, 239, 243, 
258 
flush memory instruction, 174 
flush register instruction, 177 
jump-and-link instruction, 226 
load instructions, 227, 247, 248, 250, 252 
logical operation instructions, 137, 275, 363 
move instructions, 266, 268 
POPC, 278 
PREFETCH, 280 
RETURN, 298 
1/O 
access, 379 
memory, 378 
memory-mapped, 379 
IEEE 754, 10 
IEEE Std 754-1985, 10, 19, 59, 62, 65, 67, 119, 365, 
481 


IEEE 754 exception floating-point trap type, 10,61, 


62, 64, 67, 365, 366, 432, 447 
IEEE-754 exception, 10 
IER register (SPARC V8), 360 
illegal instruction 
and OTHERW instruction, 307 
illegal instruction exception, 177, 447 
attempt to write in nonprivileged mode, 80 
DONE/RETRY, 155, 297, 298 
ILLTRAP, 222 
instruction not specifically defined in 
architecture, 120 
not implemented in hardware, 133 
POPC, 279 
PREFETCH, 286 
RETURN, 299 
with BPr instruction, 149 
with branch instructions, 146, 149 
with CASA and CASXA instructions, 152, 275 
with CASXA instruction, 153 
with DONE instruction, 154 
with FMOV instructions, 178 
with FMOVzcc instructions, 184 
with load instructions, 52, 234, 237, 251, 253, 
259, 416 
with move instructions, 267, 269 
with read hyperprivileged register 
instructions, 290 
with read instructions, 288, 289, 290, 362, 484 
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with store instructions, 322, 328, 334, 335, 337, 
340 
with STQFA instruction, 325 
with Tcc instructions, 350 
with TPC register, 86 
with TSTATE register, 88 
with write instructions, 360, 362 
write to ASR 5, 73 
write to STICK register, 80 
write to TICK register, 72 
ILLTRAP instruction, 222, 447 
imm_asi instruction field 
explicit ASI, providing, 108 
floating point load instructions, 239 
load instructions, 248, 250, 252 
PREFETCH, 280 
immediate CTI, 99 
I-MMU 
and instruction prefetching, 380 
IMPDEP1 instruction, 224 
IMPDEP1 instructions, 223, 477, 478 
IMPDEP2A instructions, 223, 448, 485 
IMPDEP2B instructions, 120, 223, 448 
implementation, 10 
implementation dependency, 479 
implementation dependent, 10 
implementation note, 4, 5 
implementation-dependent functional choice, 481 
implementation-dependent instructions, See 
IMPDEP2A instructions 
implicit ASI, 10, 108, 400 
implicit ASI memory access 
LDFSR, 243 
LDSTUB, 247 
load fp instructions, 236, 258 
load integer doubleword instructions, 250 
load integer instructions, 227 
STD, 334 
STFSR, 327 
store floating-point instructions, 321, 339 
store integer instructions, 313 
SWAP, 342 
implicit byte order, 91 
in registers, 46, 49, 300 
inccc synthetic instructions, 504 
inexact accrued (nxa) bit of aexc field of FSR 
register, 367 
inexact current (nxc) bit of cexc field of FSR 
register, 367 


inexact mask (nxm) field of FSR.tem, 66 
inexact quotient, 304, 354 
infinity, 367, 368 
initiated, 10 
input/output (I/O) locations 
access by nonprivileged code, 482 
behavior, 378 
contents and addresses, 482 
identifying, 488 
order, 378 
semantics, 488 
value semantics, 378 
instruction fields, 10 
See also individual instruction fields 
definition, 10 
instruction group, 10 
instruction MMU, See I-MMU 
instruction prefetch buffer, invalidation, 175 
instruction set architecture (ISA), 10, 10, 21 
instruction access exception exception, 448 
instructions 
32-bit wide, 20 
alignment, 102 
alignment, 25, 135, 381 
arithmetic, integer 
addition, 134, 345 
division, 28, 272, 304, 354 
multiplication, 28, 270, 272, 311, 356 
subtraction, 341, 351 
tagged, 28 
array addressing, 138 
atomic 
CASA/CASXA, 151 
load twin extended word from alternate 
space, 255 
load-store, 101, 151, 247, 248, 342, 343 
load-store unsigned byte, 247, 248 
successful loads, 227, 229, 251, 253 
successful stores, 313, 314 
branch 
branch if contents of integer register match 
condition, 148 
branch on floating-point condition codes, 162, 
164 
branch on integer condition codes, 142, 145 
cache, 387 
causing illegal instruction, 222 
compare and swap, 151 
comparison, 110, 341 
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conditional move, 29 

control-transfer (CTIs), 28, 154, 296 

conversion 
convert between floating-point formats, 218 
convert floating-point to integer, 216 
convert integer to floating-point, 173, 221 
floating-point to integer, 367 

count of number of bits, 278 

edge handling, 156 

fetches, 102 

floating point 
compare, 58,59, 169 
floating-point add, 160 
floating-point divide, 171 
floating-point load, 101, 236 
floating-point load from alternate space, 239 
floating-point load state register, 236, 258 
floating-point move, 178, 180, 185 
floating-point operate (FPop), 29, 243 
floating-point square root, 215 
floating-point store, 101, 321 
floating-point store to alternate space, 323 
floating-point subtract, 220 
operate (FPop), 60, 64 
short floating-point load, 245 
short floating-point store, 332 
status of floating-point load, 243 

flush instruction memory, 174 

flush register windows, 177 

formats, 100 

implementation-dependent, See IMPDEP2A 
instructions 

jump and link, 28, 226 

loads 
block load, 232 
floating point, See instructions: floating point 
integer, 101 
simultaneously addressing doublewords, 342 
unsigned byte, 151, 247 
unsigned byte to alternate space, 248 

logical operations 
64-bit /32-bit, 212, 214 
AND, 137 
logical 1-operand ops on F registers, 211 
logical 2-operand ops on F registers, 212 
logical 3-operand ops on F registers, 214 
logical XOR, 363 
OR, 275 

memory, 395 


moves 
floating point, See instructions: floating point 
move integer register, 264, 268 
on condition, 20 
ordering MEMBAR, 110 
permuting bytes specified by GSR.mask, 144 
pixel component distance, 277, 277 
pixel formatting (PACK), 197 
prefetch data, 280 
read privileged register, 290 
read state register, 29, 287 
register window management, 30 
reordering, 385 
reserved, 120 
reserved fields, 133 
RETRY 
and restartable deferred traps, 428 
RETURN vs. RESTORE, 298 
sequencing MEMBAR, 110 
set high bits of low word, 306 
set interval arithmetic mode, 308 
setting GSR.mask field, 144 
shift, 28 
shift, 309 
shift count, 309 
shut down to enter power-down mode, 307 
SIMD, 15 
simultaneous addressing of doublewords, 343 
stores 
block store, 317 
floating point, See instructions: floating point 
integer, 101, 313 
integer (except doubleword), 313 
integer into alternate space, 314 
partial, 329 
unsigned byte, 151 
unsigned byte to alternate space, 248 
unsigned bytes, 247 
swap R register, 342, 343 
synthetic (for assembly language 
programmers), 502-504 
tagged addition, 345 
test-and-set, 393 
timing, 133 
trap on integer condition codes, 348 
write privileged register, 361 
write state register, 359 


integer unit (IU) 


condition codes, 71 
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definition, 10 
description, 24 
interrupt 
enable (ie) field of PSTATE register, 430, 431 
level, 95 
request, 10, 30, 423 
interrupt_level_14 exception, 78, 448 
and SOFTINT.int_level, 78 
and STICK_CMPR.stick_cmpr, 81 
and TICK_CMPR.tick_cmpr, 80 
interrupt_level_15 exception 
and SOFTINT.int_level, 78 
interrupt_level_n exception, 430, 448 
and SOFTINT register, 77 
and SOFTINT.int_level, 78 
inter-strand operation, 10 
intra-strand operation, 10 
invalid accrued (nva) bit of aexc field of FSR 
register, 66 
invalid ASI 
and data_access_exception, 446 
invalid current (nvc) bit of cexc field of FSR 
register, 66, 367, 368 
invalid mask (nvm) field of FSRtem, 66, 367, 368 
invalid_exception exception, 216 
invalid_fp_register floating-point trap type, 159, 
160, 170, 171, 173, 178, 184, 215 
INVALW instruction, 225 
iprefetch synthetic instruction, 502 
ISA, 10 
ISA, See instruction set architecture 
issue unit, 385, 385 
issued, 11 
italic font, in assembly language syntax, 495 
IU, 11 
ixc synthetic instructions, 504 
IXX>data_access_exception (invalid ASI) 
with load alternate instructions, 253 


J 
jmp synthetic instruction, 502 
JMPL instruction, 226 
computing target address, 28 
does not change CWP, 50 
mem_address_not_aligned exception, 448 
reexecuting trapped instruction, 298 
jump and link, See JMPL instruction 


L 
LD instruction (SPARC V8), 227 
LDBLOCKF instruction, 232, 415 
LDD instruction (SPARC V8 and V9), 251 
LDDA instruction, 414 
LDDA instruction (SPARC V8 and V9), 253 
LDDF instruction, 102, 236, 448 
LDDF mem adaress not aligned exception, 448 
address not doubleword aligned, 486 
address not quadword aligned, 487 
LDDF/LDDEFA instruction, 102 
load instruction with partial store ASI and 
misaligned address, 241 
with load instructions, 237, 240, 416 
with store instructions, 324, 416 
LDDF mem not aligned exception, 57 
LDDFA instruction, 239, 331 
alignment, 102 
ASIs for fp load operations, 416 
behavior with partial store ASIs, 237—??, 241, 
241-??, 258-??, 416-?? 
causing LDDF mem address not aligned 
exception, 102, 448 
for block load operations, 415 
used with ASIs, 415 
LDF instruction, 57, 236 
LDFA instruction, 57, 239 
LDFSR instruction, 58, 60, 61, 243, 448 
LDQF instruction, 236, 450 
LDQF mem address not aligned exception, 450 
address not quadword aligned, 487 
LDOF/LDOFA instruction, 103 
with load instructions, 240 
LDOFA instruction, 239 
LDSB instruction, 227 
LDSBA instruction, 229 
LDSH instruction, 227 
LDSHA instruction, 229 
LDSHORTT instruction, 245 
LDSTUB instruction, 101, 247, 248, 393 
and data access exception (noncacheable page) 
exception, 446 
hardware primitives for mutual exclusion of 
LDSTUB, 392 
LDSTUBA instruction, 247, 248 
alternate space addressing, 26 
and data access exception exception, 446 
hardware primitives for mutual exclusion of 
LDSTUBA, 392 
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LDSW instruction, 227 
LDSWA instruction, 229 
LDTW instruction, 52, 102 
LDTW instruction (deprecated), 250 
LDTWA instruction, 52, 102 
LDTWA instruction (deprecated), 252 
LDTX instruction, 412 
LDTXA instruction, 104, 106, 255, 413 
access alignment, 102 
access size, 102 
and data_access_exception (noncacheable page) 
exception, 446 
LDUB instruction, 227 
LDUBA instruction, 229 
LDUH instruction, 227 
LDUHA instruction, 229 
LDUW instruction, 227 
LDUWA instruction, 229 
LDX instruction, 227 
LDXA instruction, 229, 254, 390 
LDXFSR instruction, 58, 60, 61, 243, 258, 302, 448 
leaf procedure 
modifying windowed registers, 117 
little-endian byte order, 11, 26, 90 
load 
block, See block load instructions 
floating-point from alternate space 
instructions, 239 
floating-point instructions, 236, 243 
floating-point state register instructions, 236, 258 
from alternate space, 27, 71, 108 
instructions, 11 
instructions accessing memory, 101 
nonfaulting, 384 
short floating-point, See short floating-point load 
instructions 
LoadLoad MEMBAR relationship, 261 
LoadLoad MEMBAR relationship, 394 
LoadLoad predefined constant, 500 
loads 
nonfaulting, 396, 397 
load-store alignment, 25, 102, 381 
load-store instructions 
compare and swap, 151 
definition, 11 
load-store unsigned byte, 151, 247, 342, 343 
load-store unsigned byte to alternate space, 248 
memory access, 25 
swap R register with alternate space 





memory, 343 

swap R register with memory, 151, 342 
LoadStore MEMBAR relationship, 261, 394 
LoadStore predefined constant, 500 
local registers, 46, 49, 292 
logical XOR instructions, 363 
Lookaside predefined constant, 500 
LSTPARTIALF instruction, 416 





M 
MAXPGL, 24, 46, 48, 94, 96, 96, 97, 492 
MAXPTL 
and MAXPGL, 97 
instances of TNPC register, 87 
instances of TPC register, 86 
instances of TSTATE register, 88 
instances of TT register, 89 
may (keyword), 11 
mem_address_not_aligned exception, 448 
JMPL instruction, 226 
LDTXA, 413, 414, 415 
load instruction with partial store ASI and 
misaligned address, 241 
RETURN, 299 
when recognized, 153 
with CASA instruction, 152 
with compare instructions, 153 
with load instructions, 102-103, 227, 228, 230, 
236, 243, 251, 253, 254, 258, 339, 415, 416 
with store instructions, 102-103, 313, 314, 316, 
325, 328, 335, 337, 415, 416 
with swap instructions (deprecated), 342, 344 
MEMBAR 
#Sync 
semantics, 263 
instruction 
atomic operation ordering, 393 
FLUSH instruction, 174, 395 
functions, 260, 393-395 
memory ordering, 262 
memory synchronization, 110 
side-effect accesses, 380 
STBAR instruction, 262 
mask encodings 
#LoadLoad, 261, 394 
#LoadStore, 261, 394 
#Lookaside, 261, 395 
#MemIssue, 261, 395 
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StoreLoad, 261,394 
StoreStore, 261, 394 
Sync, 261, 395 
predefined constants 
LoadLoad, 500 
LoadStore, 500 
Lookaside, 500 
MemIssue, 500 
StoreLoad, 500 
StoreStore, 500 
Sync, 500 
MEMBAR 
#Lookaside, 390 
#StoreLoad, 390 
membar_mask, 500 
MemIssue predefined constant, 500 
memory 
access instructions, 25, 101 
alignment, 381 
atomic operations, 392 
atomicity, 488 
cached, 378 
coherence, 380, 488 
coherency unit, 381 
data, 395 
instruction, 395 
location, 378 
models, 377 
ordering unit, 381 
real, 378 
reference instructions, data flow order 
constraints, 386 
synchronization, 262 
virtual address, 378 
virtual address 0, 397 
Memory Management Unit 
definition, 11 
Memory Management Unit, See MMU 
memory model 
mode control, 389 
partial store order (PSO), 388 
relaxed memory order (RMO), 262, 388 
sequential consistency, 389 
strong, 389 
total store order (TSO), 262, 388, 389 
weak, 388 
memory model (mm) field of PSTATE register, 91 
memory order 
pending transactions, 387 








program order, 385 
memory_model (mm) field of PSTATE register, 389 
memory-mapped I/O, 379 
metrics 
for architectural performance, 421 
for implementation performance, 421 
See also performance monitoring hardware 
MMU 
accessing registers, 467 
definition, 11 
page sizes, 461 
mode 
nonprivileged, 22 
privileged, 24, 86, 383 
motion estimation, 277 
MOVA instruction, 264 
MOVCC instruction, 264 
MOVcc instructions, 264 
conditionally moving integer register 
contents, 71 
conditions for copying integer register 
contents, 115 
copying a register, 58 
encoding of cond field, 475 
encoding of opf_cc instruction field, 476 
used to avoid branches, 184, 266 
MOVCS instruction, 264 
move floating-point register if condition is true, 180 
move floating-point register if contents of integer 
register satisfy condition, 185 
MOVE instruction, 264 
move integer register if condition is satisfied 
instructions, 264 
move integer register if contents of integer register 
satisfies condition instructions, 268 
move on condition instructions, 20 
MOVFA instruction, 265 
MOVEFE instruction, 265 
MOVEG instruction, 265 
MOVEGE instruction, 265 
MOVEL instruction, 265 
MOVELE instruction, 265 
MOVELG instruction, 265 
MOVEN instruction, 265 
MOVENE instruction, 265 
MOVEFO instruction, 265 
MOVFU instruction, 265 
MOVEUE instruction, 265 
MOVFUG instruction, 265 
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MOVFUGE instruction, 265 

MOVFUL instruction, 265 

MOVFULE instruction, 265 

MOVG instruction, 264 

MOVGE instruction, 264 

MOVGU instruction, 264 

MOVL instruction, 264 

MOVLE instruction, 264 

MOVLEU instruction, 264 

MOVN instruction, 264 

movn synthetic instructions, 504 

MOVNE instruction, 264 

MOVNEG instruction, 264 

MOVPOS instruction, 264 

MOVr instructions, 116, 268, 475 

MOVRGEZ instruction, 268 

MOVRGZ instruction, 268 

MOVRLEZ instruction, 268 

MOVRLZ instruction, 268 

MOVRNZ instruction, 268 

MOVRZ instruction, 268 

MOVVC instruction, 264 

MOVVS instruction, 264 

multiple unsigned condition codes, emulating, 116 

multiply instructions, 272, 311, 356 

multiprocessor synchronization instructions, 151, 
342, 343 

multiprocessor system, 11, 174, 285, 342, 343, 387, 
488 

MULX instruction, 272 

must (keyword), 11 





N 
N superscript on instruction name, 124 
N REG WINDOWS, 12 
integer unit registers, 24, 481 
RESTORE instruction, 292 
SAVE instruction, 300 
value of, 46, 82 
NaN (not-a-number) 
conversion to integer, 367 
converting floating-point to integer, 216 
signalling, 59, 169, 218 
neg synthetic instructions, 503 
negative infinity, 367, 368 
nested traps, 20 
next program counter register, See NPC register 
NFO, 11 


noncacheable 
accesses, 378 
nonfaulting load, 11, 384 
nonfaulting loads 
behavior, 396 
use by optimizer, 397 
nonleaf routine, 226 
nonprivileged, 12 
mode, 7, 12, 22, 24, 61 
software, 73 
nonresumable error exception, 448 
nonstandard floating-point, See floating-point status 
register (FSR) NS field 
nontranslating ASI, 12, 254, 337, 400 
nonvirtual memory, 285 
NOP instruction, 142, 162, 165, 273, 281, 349 
normal traps, 434 
NORMALW instruction, 274 
not synthetic instructions, 503 
note 
architectural direction, 5 
compatibility, 5 
general, 4 
implementation, 4 
programming, 4 
NPC (next program counter) register, 73 
control flow alteration, 15 
definition, 11 
DONE instruction, 154 
instruction execution, 99 
relation to TNPC register, 87 
RETURN instruction, 296 
saving after trap, 30 
npt, 12 
nucleus context, 176 
nucleus software, 12 
NUMA, 12 
nvm (invalid mask) field of FSR.tem, 66, 367, 368 
NWIN, See N REG WINDOWS 
nxm (inexact mask) field of FSR.tem, 66 


O 
octlet, 12 
odd parity, 12 
ofm (overflow mask) field of FSR.tem, 66 
op3 instruction field 
arithmetic instructions, 134, 146, 149, 151, 270, 
272, 304, 311, 354, 356 
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floating point load instructions, 236, 239, 243, 
258 
flush instructions, 174, 177 
jump-and-link instruction, 226 
load instructions, 227, 247, 248, 250, 252 
logical operation instructions, 137, 275, 363 
PREFETCH, 280 
RETURN, 298 
opcode 
definition, 12 
format, 224 
opf instruction field 
floating point arithmetic instructions, 160, 171, 
194, 215 
floating point compare instructions, 169 
floating point conversion instructions, 216, 218, 
221 
floating point instructions, 159 
floating point integer conversion, 173 
floating point move instructions, 178 
floating point negate instructions, 196 
opf cc instruction field 
floating point move instructions, 180 
move instructions, 476 
opf low instruction field, 180 
optional, 12 
OR instruction, 275 
ORcc instruction, 275 
ordering MEMBAR instructions, 110 
ordering unit, memory, 381 
ORN instruction, 275 
ORNcc instruction, 275 
OTHERW instruction, 276 
OTHERWIN (other windows) register, 84 
FLUSHW instruction, 177 
keeping consistent state, 86 
modified by OTHERW instruction, 276 
partitioned, 85 
range of values, 82, 489 
rd designation for WRPR instruction, 361 
rs1 designation for RDPR instruction, 290 
SAVE instruction, 301 
zeroed by INVALW instruction, 225 
zeroed by NORMALW instruction, 274 
OTHERWIN register trap vectors 
fill/spill traps, 451 
handling spill/fill traps, 451 
selecting spill/fill vectors, 451 
out register #7, 52 


out registers, 46, 49, 300 
overflow 
bits 
(V) in condition fields of CCR, 111 
accrued (ofa) in aexc field of FSR register, 66 
current (ofc) in cexc field of FSR register, 66 
causing spill trap, 450 
tagged add /subtract instructions, 111 
overflow mask (ofm) field of FSR.tem, 66 


P 
p (predict) instruction field of branch 
instructions, 145, 148, 149, 165 
P superscript on instruction name, 124 
packed-to-planar conversion, 206 
packing instructions, See FPACK instructions 
page fault, 285 
page table entry (PTE), See translation table entry 
(TTE) 
parity, even, 8 
parity, odd, 12 
partial store instructions, 329, 416 
partial store order (PSO) memory model, 388, 388 
partitioned 
additions, 203 
subtracts, 208 
Pag] Superscript on instruction name, 124 
Pasg Superscript on instruction name, 124 
PC (program counter) register, 13, 68, 72 
after instruction execution, 99 
CALL instruction, 150 
changed by NOP instruction, 273 
copied by JMPL instruction, 226 
saving after trap, 30 
set by DONE instruction, 154 
set by RETRY instruction, 296 
Trap Program Counter register, 86 
PCR 
ASR summary, 68 
PCR register fields 
priv, 75 
sl (select lower bits of PIC), 75 
St (system trace enable), 75 
Su (select upper bits of PIC), 75 
ut (user trace enable), 75 
PDIST instruction, 277 
pef field of PSTATE register 
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and access to GSR, 76 
and fp disabled exception, 447 
and FPop instructions, 119 
branch operations, 163, 165 
byte permutation, 144 
comparison operations, 167, 170 
data movement operations, 267 
enabling FPU, 73 
floating-point operations, 159, 160, 171, 173, 178, 
183, 186, 194, 196, 215, 216, 218, 220, 221, 236, 
239, 243, 245, 258 
integer arithmetic operations, 205, 210 
logical operations, 211, 212, 214 
memory operations, 234 
read operations, 289, 308, 319 
special addressing operations, 135, 161, 321, 327, 
331, 333, 339, 360 
trap control, 431 
pef, See PSTATE, pef field 
Performance Control register, See PCR 
performance instrumentation counter register, See 
PIC register 
performance monitoring hardware 
accuracy requirements, 421 
classes of data reported, 421 
counters and controls, 422 
high-level requirements, 419 
kinds of user needs, 419 
See also instruction sampling 
physical processor, 12 
PIC (performance instrumentation counter) 
register, 12, 75 
accessing, 449 
ASR summary, 68 
and PCR, 74 
picl field, 76 
picu field, 76 
PIL (processor interrupt level) register, 95 
interrupt conditioning, 430 
interrupt request level, 432 
interrupt level n, 448 
specification of register to read, 290 
specification of register to write, 361 
trap processing control, 431 
pipeline, 13 
pipeline draining of CPU, 82, 86 
pixel instructions 
compare, 166 
component distance, 277, 277 


formatting, 197 
pixel registers for storing values, 223 
planar-to-packed conversion, 206 


Papt Superscript on instruction name, 124 


POPC instruction, 278 
positive infinity, 367, 368 


Ppic Superscript on instruction name, 124 


precise floating-point traps, 291 
precise trap, 426 
conditions for, 426 
software actions, 427 
vs. disrupting trap, 429 
predefined constants 
LoadLoad, 500 
lookaside, 500 
MemIssue, 500 
StoreLoad, 500 
StoreStore, 500 
Sync, 500 
predict bit, 149 
prefetch 
for one read, 284 
for one write, 285 
for several reads, 284 
for several writes, 284 
page, 285 
prefetch data instruction, 280 
PREFETCH instruction, 101, 280, 485 
prefetch_fcn, 500 
PREFETCHA instruction, 280, 485 
and invalid ASI or VA, 446 
prefetchable, 13 
priority of traps, 431, 442 
priveleged action exception 


read from TICK register when access disabled, 72 


privilege violation 

and data access exception, 446, 448 
privileged, 13 

mode, 24, 86 

registers, 86 


software, 22, 50, 61, 92, 109, 177, 434, 485 
privileged (priv) field of PCR register, 289 


privileged (priv) field of PSTATE register, 94, 152, 
154, 155, 230, 234, 239, 240, 248, 253, 314, 319, 


324, 337, 343, 344, 360, 383, 448, 449 
privileged mode, 13 
privileged action exception, 448 
accessing restricted ASIs, 383 
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PIC access, 75 


read from TICK register when access disabled, 72 


restricted ASI access attempt, 109, 400 

TICK register access attempt, 71 

with CASA instruction, 152 

with compare instructions, 153 

with load alternate instructions, 230, 234, 240, 

248, 253, 314, 319, 324, 337, 344, 360 

with load instructions, 239 

with RDasr instructions, 289 

with read instructions, 289 

with store instructions, 326 

with swap instructions, 344 
privileged_opcode exception, 449 

DONE instruction, 155 

RETRY instruction, 297 

SAVED instruction, 302 

with DONE instruction, 155, 290, 297, 362 

with write instructions, 362 

WRPR in nonprivileged mode, 72 
processor, 13 

execute unit, 385 

issue unit, 385, 385 

privilege-mode transition diagram, 425 

reorder unit, 385 

self-consistency, 385 
processor cluster, See processor module 
processor consistency, 387, 390 
processor interrupt level register, See PIL register 
processor self-consistency, 385, 389 
processor state register, See PSTATE register 
processor states 

execute_state, 443 
program counter register, See PC register 
program counters, saving, 423 
program order, 385, 385 
programming note, 4 
PSO, See partial store order (PSO) memory model 
PSR register (SPARC V8), 360 
PSTATE register 

fields 

priv 
and access to PCR, 74 

PSTATE register 

entering privileged execution mode, 423 

restored by RETRY instruction, 154, 296 

saved after trap, 423 

saving after trap, 30 

specification for RDPR instruction, 290 


specification for WRPR instruction, 361 
and TSTATE register, 88 
PSTATE register fields 
ag 
unimplemented, 94 
am 
CALL instruction, 150 
description, 92 
masked /unmasked address, 154, 226, 296, 
298 
cle 
and implicit ASIs, 108 
and PSTATE.tle, 91 
description, 90 


description, 94 
enabling disrupting traps, 430 
interrupt conditioning, 430 
masking disrupting trap, 435 
mm 
description, 91 
implementation dependencies, 91, 388, 488 
reserved values, 91 
pef 
and FPRS.fef, 92 
description, 92 
See also pef field of PSTATE register 
priv 
access to register-window PR state 
registers, 86 
accessing restricted ASIs, 383 
description, 94 
determining mode, 12, 13, 465 
tle 
description, 91 
PTE (page table entry), See translation table entry 
(TTE) 


Q 


quadword, 13 
alignment, 25, 102, 381 
data format, 33 
quiet NaN (not-a-number), 59, 169 


R 
R register, 13 
#15, 52 
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special-purpose, 52 
alignment, 251, 253 
rational quotient, 354 
R-A-W, See read-after-write memory hazard 
rcond instruction field 
branch instructions, 148 
encoding of, 475 
move instructions, 268 
rd (rounding), 13 
rd instruction field 
arithmetic instructions, 134, 146, 149, 151, 270, 
272, 304, 311, 354, 356 
floating point arithmetic, 160 
floating point arithmetic instructions, 171, 194, 
215 
floating point conversion instructions, 216, 218, 
221 
floating point integer conversion, 173 
floating point load instructions, 236, 239, 243, 
258 
floating point move instructions, 178, 180 
floating point negate instructions, 196 
floating-point instructions, 159 
jump-and-link instruction, 226 
load instructions, 227, 247, 248, 250, 252 
logical operation instructions, 137, 275, 363 
move instructions, 266, 268 
POPC, 278 
RDASI instruction, 67, 71, 287 
RDasr instruction, 287 
accessing I/O registers, 27 
implementation dependencies, 288, 484 
reading ASRs, 67 
RDCCR instruction, 67, 69, 287, 287 
RDFPRS instruction, 68, 73, 287 
RDGSR instruction, 68, 76, 287 
RDPC instruction, 68, 287 
reading PC register, 73 
RDPCR instruction, 68, 287 
RDPIC instruction, 68, 287, 449 
RDPR instruction, 68, 290 
accessing GL register, 97 
accessing non-register-window PR state 
registers, 86 
accessing register-window PR state registers, 82 
and register-window PR state registers, 81 
effect on TNPC register, 88 
effect on TPC register, 87 
effect on TSTATE register, 89 


effect on TT register, 89 
reading privileged registers, 86 
reading PSTATE register, 90 
reading the TICK register, 72 
registers read, 290 
RDSOFTINT instruction, 68, 77, 287 
RDSTICK instruction, 68, 80, 287 
RDSTICK CMPR instruction, 68, 287 
RDTICK instruction, 68, 72, 287 
RDTICK_CMPR instruction, 68, 287 
RDY instruction, 69 
read ancillary state register (RDasr) 
instructions, 287 
read state register instructions, 29 
read-after-write memory hazard, 385, 386 
real address, 14 
real ASI, 400 
real memory, 378 
reference MMU, 495 
reg, 496 
reg or imm, 500, 501 
reg plus imm, 500 
regaddr, 500 
register reference instructions, data flow order 
constraints, 385 
register window, 46, 48 
register window management instructions, 30 
register windows 
clean, 84, 85, 86, 117, 445, 450, 451, 452 
fill, 50, 85, 117, 118, 293, 294, 302, 447, 451, 452 
management of, 22 
overlapping, 49-51 
spill, 50, 85, 117, 118, 301, 302, 449, 450, 451, 452 
registers 
See also individual register (common) names 
accessing MMU registers, 467 
address space identifier (ASI), 383 
ASI (address space identifier), 71 
chip-level multithreading, See CMT 
clean windows (CLEANWIN), 83 
clock-tick (TICK), 449 
current window pointer (CWP), 82 
F (floating point), 365, 432 
floating-point, 24 
programming, 56 
floating-point registers state (FPRS), 73 
floating-point state (FSR), 58 
general status (GSR), 76 
global, 20, 24, 46, 48, 48, 481 
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global level (GL), 96 
IER (SPARC V8), 360 
in, 46, 49, 300 
local, 46, 49 
next program counter (NPC), 73 
other windows (OTHERWIN), 84 
out, 46, 49, 300 
out #7, 52 
performance control (PCR), 74 
performance instrumentation counter (PIC), 75 
pixel storage registers, 223 
processor interrupt level (PIL) 

and PIC, 76 

and PIC counter overflow, 76 

and SOFTINT, 78 

and STICK_CMPR, 81 

and TICK_CMPR, 80 
processor interrupt level (PIL), 95 
program counter (PC), 72 
PSR (SPARC V8), 360 
R register #15, 52 
renaming mechanism, 386 
restorable windows (CANRESTORE), 83, 84 
savable windows (CANSAVE), 83 
scratchpad 

privileged, 417 
SOFTINT, 68 
SOFTINT_CLR pseudo-register, 68, 79 
SOFTINT_SET pseudo-register, 68, 78 
STICK, 80 
STICK_CMPR 

ASR summary, 68 

int_dis field, 78, 81 

stick cmpr field, 81 

and system software trapping, 81 
TBR (SPARC V8), 360 
TICK, 71 
TICK_CMPR 

int_dis field, 78, 80 

tick cmpr field, 80 
TICK CMPR, 68, 79 
trap base address (TBA), 90 
trap base address, See registers: TBA 
trap level (TL), 94 
trap level, See registers: TL 
trap next program counter (TNPC), 87 
trap next program counter, See registers: TNPC 
trap program counter (TPC), 86 
trap program counter, See registers: TPC 


trap state (TSTATE), 88 

trap state, See registers: TSTATE 
trap type (TT), 89, 434 

trap type, See registers: TT 

VA WATCHPOINT, 449 


visible to software in privileged mode, 86-97 


WIM (SPARC V8), 360 

window state (WSTATE), 84 
window state, See registers: WSTATE 
Y (32-bit multiply/divide), 69 


relaxed memory order (RMO) memory model, 262, 


388 
renaming mechanism, register, 386 
reorder unit, 385 
reordering instruction, 385 
reserved, 14 
fields in instructions, 133 
register field, 46 
reset 
reset trap, 429 
restartable deferred trap, 427 
restorable windows register, See CANRESTORE 
register 
RESTORE instruction, 50, 292-293 
actions, 117 
and current window, 52 
decrementing CWP register, 49 
fill trap, 447, 451 
followed by SAVE instruction, 50 
managing register windows, 30 
operation, 292 
performance trade-off, 292, 300 
and restorable windows (CANRESTORE) 
register, 83 
restoring register window, 292 
role in register state partitioning, 85 
restore synthetic instruction, 502 
RESTORED instruction, 118, 294 
creating inconsistent window state, 294 
fill handler, 293 
fill trap handler, 118, 452 
register window management, 30 
restricted, 14 
restricted address space identifier, 109 
restricted ASI, 383, 399 
resumable error exception, 449 
ret/ret1 synthetic instructions, 502 
RETRY instruction, 296 
and restartable deferred traps, 428 
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effect on TNPC register, 88 
effect on TPC register, 87 
effect on TSTATE register, 89 
generating illegal_instruction exception, 448 
modifying CCR.xcc, 70 
reexecuting trapped instruction, 452 
restoring gl value in GL, 97 
return from trap, 423 
returning to instruction after trap, 430 
target address, return from privileged traps, 28 
RETURN instruction, 298-299 
computing target address, 28 
fill trap, 447 
mem address not aligned exception, 448 
operation, 298 
reexecuting trapped instruction, 298 
RETURN vs. RESTORE instructions, 298 
RMO, 14 
RMO, See relaxed memory order (RMO) memory 
model 
rounding 
for floating-point results, 59 
in signed division, 304 
rounding direction (rd) field of FSR register, 160, 
171, 194, 215, 216, 218, 220, 221 
routine, nonleaf, 226 
rs1 instruction field 
arithmetic instructions, 134, 146, 149, 151, 270, 
272, 304, 311, 354, 356 
branch instructions, 148 
floating point arithmetic instructions, 160, 171, 
194 
floating point compare instructions, 169 
floating point load instructions, 236, 239, 243, 
258 
flush memory instruction, 174 
jump-and-link instruction, 226 
load instructions, 227, 247, 248, 250, 252 
logical operation instructions, 137, 275, 363 
move instructions, 268 
PREFETCH, 280 
RETURN, 298 
rs2 instruction field 
arithmetic instructions, 134, 146, 149, 151, 270, 
272, 275, 304, 311, 354, 356 
floating point arithmetic instructions, 160, 171, 
194, 215 
floating point compare instructions, 169 


floating point conversion instructions, 216, 218, 


221 
floating point instructions, 159 
floating point integer conversion, 173 
floating point load instructions, 236, 239, 243, 
258 
floating point move instructions, 178, 180 
floating point negate instructions, 196 
flush memory instruction, 174 
jump-and-link instruction, 226 
load instructions, 227, 250, 252 
logical operation instructions, 137, 363 
move instructions, 266, 268 
POPC, 278 
PREFETCH, 280 
RTO, 14 
RTS, 14 


S 
savable windows register, See CANSAVE register 
SAVE instruction, 49, 300 
actions, 116 
after RESTORE instruction, 298 
clean window exception, 445, 451 
and current window, 52 
decrementing CWP register, 49 
effect on privileged state, 301 
leaf procedure, 226 
and local /out registers of register window, 50 
managing register windows, 30 
no clean window available, 84 
number of usable windows, 83 
operation, 300 
performance trade-off, 300 
role in register state partitioning, 85 
and savable windows (CANSAVE) register, 83 
spill trap, 449, 450, 452 
save synthetic instruction, 502 
SAVED instruction, 118, 302 
creating inconsistent window state, 302 
register window management, 30 
spill handler, 301, 302 
spill trap handler, 118, 452 
scaling of the coefficient, 189 
scratchpad registers 
privileged, 417 
SDIV instruction, 69, 304 
SDIVcc instruction, 69, 304 
SDIVX instruction, 272 
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self-consistency, processor, 385 
self-modifying code, 174, 175, 393 
sequencing MEMBAR instructions, 110 
sequential consistency, 380, 388, 389 
sequential consistency memory model, 389 
SETHI instruction, 110, 306 
creating 32-bit constant in R register, 27 
and NOP instruction, 273 
with rd = 0, 306 
setn synthetic instructions, 503 
shall (keyword), 14 
shared memory, 377 
shift count encodings, 309 
shift instructions, 28 
shift instructions, 110, 309 
short floating-point load and store instructions, 416 
short floating-point load instructions, 245 
short floating-point store instructions, 332 
should (keyword), 15 
SHUTDOWN instruction, 307 
SIAM instruction, 308 
side effect 
accesses, 379 
definition, 15 
I/O locations, 378 
instruction prefetching, 380 
real memory storage, 378 
visible, 379 
signalling NaN (not-a-number), 59, 218 
signed integer data type, 33 
signx synthetic instructions, 503 
SIMD, 15 
instruction data formats, 41-43 
simm10 instruction field 
move instructions, 268 
simm!1 1 instruction field 
move instructions, 266 
simm13 instruction field 
floating point 
load instructions, 236, 258 
simm13 instruction field 
arithmetic instructions, 270, 272, 275, 304, 311, 
354, 356 
floating point load instructions, 239, 243 
flush memory instruction, 174 
jump-and-link instruction, 226 
load instructions, 227, 247, 248, 250, 252 
logical operation instructions, 137, 363 
POPC, 278 


PREFETCH, 280 
RETURN, 298 
single instruction/ multiple data, See SIMD 
SLL instruction, 309 
SLLX instruction, 309 
SMUL instruction, 69, 311 
SMULcc instruction, 69, 311 
SOFTINT register, 68, 77 
clearing, 457 
clearing of selected bits, 79 
communication from nucleus code to kernel 
code, 456 
scheduling interrupt vectors, 455, 456 
setting, 456 
SOFTINT register fields 
int level, 78 
sm (stick int), 78 
tm (tick int), 78, 80 
SOFTINT CLR pseudo-register, 68, 79 
SOFTINT_SET pseudo-register, 68, 78, 79 
software 
nucleus, 12 
software translation table, 461 
software trap, 349, 434 
software trap number (SWTN), 349 
software, nonprivileged, 73 
software trap number, 501 
source operands, 203, 208 
SPA 
ASI TWIN DW NUCLEUS, 418 
SPARC V8 compatibility 
LD, LDUW instructions, 227 
operations to I/O locations, 380 
read state register instructions, 288 
STA instruction renamed, 315 
STBAR instruction, 262 
STD instruction, 335 
STDA instruction, 337 
tagged subtract instructions, 353 
UNIMP instruction renamed, 222 
window overflow exception superseded, 447 
write state register instructions, 360 
SPARC V9 
compliance, 12 
features, 20 
SPARC V9 Application Binary Interface (ABI), 22 
speculative load, 15 
spill register window, 449 
FLUSH instruction, 118 
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overflow/underflow, 50 
RESTORE instruction, 117 
SAVE instruction, 85, 117, 300, 450 
SAVED instruction, 118, 302, 452 
selection of, 451 
trap handling, 452 
trap vectors, 301, 452 
window state, 85 
spill_n_normal exception, 301, 449 
and FLUSHW instruction, 177 
spill_n_other exception, 301, 449 
and FLUSHW instruction, 177 
SRA instruction, 309 
SRAX instruction, 309 
SRL instruction, 309 
SRLX instruction, 309 
stack frame, 300 
state registers (ASRs), 67-81 
STB instruction, 313 
STBA instruction, 314 
STBAR instruction, 288, 359, 386, 393 
STBLOCKF instruction, 317, 415 
STDF instruction, 102, 321, 449 
STDF mem address not aligned exception, 449 
and store instructions, 322, 325 
STDF/STDEFA instruction, 102 
STDFA instruction, 323 
alignment, 102 
ASIs for fp store operations, 416 
causing data access exception exception, 416 
causing mem adaress not aligned or 
illegal instruction exception, 416 
causing STDF mem adaress not aligned 
exception, 102, 449 
for block load operations, 415 
for partial store operations, 416 
used with ASIs, 415 
STF instruction, 321 
STFA instruction, 323 
STFSR instruction, 58, 60, 61, 448 
STH instruction, 313 
STHA instruction, 314 
STICK register, 68, 72, 80 
counter field, 80 
npt field, 72, 80 
RDSTICK instruction, 287 
STICK CMPR register, 68, 81 
int dis field, 78, 81 
RDSTICK CMPR instruction, 287 








stick cmpr field, 81 
store 
block, See block store instructions 
partial, See partial store instructions 
short floating-point, See short floating-point store 
instructions 
store buffer 
merging, 379 
store floating-point into alternate space 
instructions, 323 
store instructions, 15, 101 
StoreLoad MEMBAR relationship, 261, 394 
StoreLoad predefined constant, 500 
stores to alternate space, 27, 71, 108 
StoreStore MEMBAR relationship, 261, 394 
StoreStore predefined constant, 500 
STPARTIALF instruction, 329 
STOF instruction, 103, 321, 450 
STQF mem address not aligned exception, 450 
STOF/STOFA instruction, 103 
STOFA instruction, 103, 323 
strand, 15 
strong consistency memory model, 389 
strong ordering, 389 
Strong Sequential Order, 390 
strongly ordered page, illegal access to, 446 
STSHORTF instruction, 332 
STTW instruction, 52, 102 
STTW instruction (deprecated), 334 
STTWA instruction, 52, 102 
STTWA instruction (deprecated), 336 
STW instruction, 313 
STWA instruction, 314 
STX instruction, 313 
STXA instruction, 314 
accessing nontranslating ASIs, 337 
mem address not aligned exception, 314 
referencing internal ASIs, 390 
STXFSR instruction, 58, 60, 61, 339, 448 
SUB instruction, 341, 341 
SUBC instruction, 341, 341 
SUBcc instruction, 110, 341, 341 
SUBCcc instruction, 341, 341 
subnormal number, 15 
subtract instructions, 341 
superscalar, 15 
supervisor software 
accessing special protected registers, 26 
definition, 15 
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SWAP instruction, 25, 342 
accessing doubleword simultaneously with other 
instructions, 343 
and data_access_exception (noncacheable page) 
exception, 446 
hardware primitive for mutual exclusion, 392 
identification of R register to be exchanged, 101 
in multiprocessor system, 247, 248 
memory accessing, 342 
ordering by MEMBAR, 393 
swap R register 
bit contents, 151 
with alternate space memory instructions, 343 
with memory instructions, 342 
SWAPA instruction, 343 
accessing doubleword simultaneously with other 
instructions, 343 
alternate space addressing, 26 
and data_access_exception (noncacheable page) 
exception, 446 
hardware primitive for mutual exclusion, 392 
in multiprocessor system, 247, 248 
ordering by MEMBAR, 393 
SWTN (software trap number), 349 
Sync predefined constant, 500 
synchonization, 263 
synchronization, 15 
synthetic instructions 
mapping to SPARC V9 instructions, 502-504 
for assembly language programmers, 502 
mapping 
bclrg, 504 
bset, 504 
btog, 504 
btst, 504 
call, 502 
casn, 503 
clrn, 504 
cmp, 502 
dec, 504 
deccc, 504 
inc, 504 
inccc, 504 
iprefetch, 502 
jmp, 502 
movn, 504 
neg, 503 
not, 503 
restore, 502 


ret/ret1, 502 
save, 502 
setn, 503 
signx, 503 
tst, 502 
vs. pseudo ops, 502 
system clock-tick register (STICK), 80 
system software 
accessing memory space by server program, 382 
ASIs allowing access to memory space, 384 
FLUSH instruction, 176, 396 
processing exceptions, 382 
trap types from which software must recover, 61 
System Tick Compare register, See STICK_CMPR 
register 
System Tick register, See STICK register 


T 
TA instruction, 348, 475 
TADDcc instruction, 111, 345 
TADDccTV instruction, 111, 449 
tag overflow, 111 
tag_overflow exception, 111, 345, 346, 347, 351, 353 
tag_overflow exception (deprecated), 449 
tagged arithmetic, 111 
tagged arithmetic instructions, 28 
tagged word data format, 33 
tagged words, 33 
TBA (trap base address) register, 90, 425 
establishing table address, 30, 423 
initialization, 433 
specification for RDPR instruction, 290 
specification for WRPR instruction, 361 
trap behavior, 16 
TBR register (SPARC V8), 360 
TCC instruction, 348 
Tce instructions, 348 
atTL>0, 434 
causing trap, 423 
causing trap to privileged trap handler, 434 
CCR register bits, 70 
generating htrap_instruction exception, 447 
generating illegal instruction exception, 447 
generating trap instruction exception, 449 
opcode maps, 471, 475, 476 
programming uses, 350 
trap table space, 30 
vector through trap table, 423 
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TCS instruction, 348, 475 
TE instruction, 348, 475 
termination deferred trap, 427 
test-and-set instruction, 393 
TG instruction, 348, 475 
TGE instruction, 348, 475 
TGU instruction, 348, 475 
thread, 16 
TICK register, 68 
counter field, 72, 485, 494 
inaccuracies between two readings of, 485, 494 
specification for RDPR instruction, 290 
TICK_CMPR register, 68, 79 
int_dis field, 78, 80 
tick cmpr field, 80 
timer registers, See TICK register and STICK register 
timing of instructions, 133 
tininess (floating-point), 66 
TL (trap level) register, 94, 425 
affect on privilege level to which a trap is 
delivered, 432 
and implicit ASIs, 108 
displacement in trap table, 423 
executing RESTORED instruction, 294 
executing SAVED instruction, 302 
indexing for WRPR instruction, 361 
indexing privileged register after RDPR, 290 
setting register value after WRPR, 361 
specification for RDPR instruction, 290 
specification for WRPR instruction, 361 
and TBA register, 433 
and TPC register, 86 
and TSTATE register, 88 
and TT register, 89 
use in calculating privileged trap vector 
address, 433 
and WSTATE register, 84 
TL instruction, 348, 475 
TLB 
and 3-dimensional arrays, 141 
miss 
reloading TLB, 461, 466 
TLE instruction, 348, 475 
TLEU instruction, 348, 475 
TN instruction, 348, 475 
TNE instruction, 348, 475 
TNEG instruction, 348, 475 
TNPC (trap next program counter) register, 87 
saving NPC, 426 


specification for RDPR instruction, 290 
specification for WRPR instruction, 361 
TNPC (trap-saved next program counter) register, 16 
total order, 388 
total store order (TSO) memory model, 91, 262, 378, 
379, 388, 388, 389 
TPC (trap program counter) register, 16, 86 
address of trapping instruction, 291 
number of instances, 86 
specification for RDPR instructions, 290 
specification for WRPR instruction, 361 
TPOS instruction, 348, 475 
translating ASI, 400 
Translation Table Entry, See TTE 
trap 
See also exceptions and traps 
noncacheable accesses, 380 
when taken, 15 
trap enable mask (tem) field of FSR register, 431, 
432, 482. 
trap handler 
privileged mode, 434 
regular /nonfaulting loads, 12 
returning from, 154, 296 
user, 62, 367 
trap level register, See TL register 
trap next program counter register, See TNPC register 
trap on integer condition codes instructions, 348 
trap program counter register, See TPC register 
trap state register, See TSTATE register 
trap type (TT) register, 434 
trap type register, See TT register 
trap instruction (ISA) exception, 349, 350, 449 
trap little endian (tle) field of PSTATE register, 90 
traps, 16 
See also exceptions and individual trap names 
categories 
deferred, 426, 427, 429 
disrupting, 426, 429 
precise, 426, 426, 429 
priority, 431, 442 
reset, 426, 429 
restartable 
implementation dependency, 428 
restartable deferred, 427 
termination deferred, 427 
caused by undefined feature/behavior, 16 
causes, 30, 30 
definition, 30, 424 
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hardware, 434 
hardware stack, 20 
level specification, 94 
model stipulations, 431 
nested, 20 
normal, 434 
processing, 443 
software, 349, 434 
stack, 443 
vector address, specifying, 90 
TSB, 16, 466 
cacheability, 466 
caching, 466 
indexing support, 466 
organization, 467 
TSO, 16 
TSO, See total store order (TSO) memory model 
tst synthetic instruction, 502 
TSTATE (trap state) register, 88 
DONE instruction, 154, 296 
registers saved after trap, 30 
restoring GL value, 97 
specification for RDPR instruction, 290 
specification for WRPR instruction, 361 
tstate, See trap state (TSTATE) register 
TSUBcc instruction, 111, 351 
TSUBccTV instruction, 111, 449 
TT (trap type) register, 89 
and privileged trap vector address, 433 
reserved values, 483 
specification for RDPR instruction, 290 
specification for WRPR instruction, 361 
and Tcc instructions, 350 
transferring trap control, 434 
window spill/fill exceptions, 84 
WRPR instruction, 361 
TTE, 16 
context ID field, 463 
cp (cacheability) field, 378 
cp field, 446, 464, 465 
cv field, 464, 465 
e field, 379, 396, 446, 464 
ie field, 464 
indexing support, 466 
nfo field, 396, 446, 463, 464 
p field, 446, 465 
size field, 466 
soft2 field, 463 
SPARC V8 equivalence, 462 


taddr field, 463 
v field, 463 
va tag field, 463 
w field, 465 
TVC instruction, 348, 475 
TVS instruction, 348, 475 
typewriter font, in assembly language syntax, 495 


U 
UDIV instruction, 69, 354 
UDIVcc instruction, 69, 354 
UDIVX instruction, 272 
ufm (underflow mask) field of FSRtem, 66 
UltraSPARC, previous ASIs 
ASI NUCLEUS. QUAD. LDD (deprecated), 418 
ASI NUCLEUS. QUAD. LDD. L (deprecated), 418 
ASI NUCLEUS QUAD LDD LITTLE 
(deprecated), 418 
ASI PHY BYPASS EC WITH EBIT L, 418 
ASI PHYS BYPASS EC WITH EBIT, 418 
ASI PHYS BYPASS EC WITH EBI LITTLE, 
418 
ASI_PHYS_USE_EC, 418 
ASI_PHYS_USE_EC_L, 418 
ASI_PHYS_USE_EC_LITTLE, 418 
UMUL instruction, 69 
UMUL instruction (deprecated), 356 
UMULcc instruction, 69 
UMULcc instruction (deprecated), 356 
unassigned, 16 
unconditional branches, 142, 146, 162, 165 
undefined, 16 
underflow 
bits of FSR register 
accrued (ufa) bit of aexc field, 66, 367 
current (ufc) bit of cexc, 66 
current (ufc) bit of cexc field, 367 
mask (ufm) bit of FSR.tem, 66 
mask (ufm) bit of tem field, 367 
detection, 50 
occurrence, 451 
underflow mask (ufm) field of FSR.tem, 66 
unfinished, FPop floating-point trap type, 62, 160, 
171, 195, 219, 220, 366 
handling, 67 
in normal computation, 61 
results after recovery, 62 
UNIMP instruction (SPARC V8), 222 
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unimplemented, 16 
unimplemented_FPop floating-point trap type, 62, 
159, 160, 170, 171, 173, 178, 184, 187, 195, 196, 
217, 219, 220, 366 
handling, 67 
result after recovery, 62 
unimplemented_LDTW exception, 251, 449 
unimplemented_STTW exception, 335, 449 
uniprocessor system, 16 
unrestricted, 16 
unrestricted ASI, 399 
unsigned integer data type, 33 
user application program, 17 
user trap handler, 62, 367 


V 
VA, 17 
VA_watchpoint exception, 449 
VA_WATCHPOINT register, 449 
value clipping, See FPACK instructions 
value semantics of input/output (I/O) 
locations, 378 
virtual 
address, 378 
address 0, 397 
virtual address, 17 
virtual core, 17 
virtual memory, 285 
VIS, 17 
VIS instructions 
encoding, 477, 478 
implicitly referencing GSR register, 76 
Visual Instruction Set, See VIS instructions 


Ww 


W-A-R, See write-after-read memory hazard 

watchpoint comparator, 93 

W-A-W, See write-after-write memory hazard 

WIM register (SPARC V8), 360 

window fill exception, See also fill_n_normal 
exception 

window fill trap handler, 30 

window overflow, 50, 450 

window spill exception, See also spill_n_normal 
exception 

window spill trap handler, 30 

window state register, See WSTATE register 


window underflow, 451 
window, clean, 300 
window fill exception, 84, 117 
RETURN, 298 
window spill exception, 84 
word, 17 
alignment, 25, 102, 381 
data format, 33 
WRASI instruction, 67, 71, 358 
WRasr instruction, 358 
accessing I/O registers, 27 
attempt to write to ASR 5 (PC), 73 
cannot write to PC register, 73 
implementation dependencies, 484 
writing ASRs, 67 
WRCCR instruction, 67, 69, 70, 358 
WRFPRS instruction, 68, 73, 358 
WRGSR instruction, 68, 76, 358 
WRIER instruction (SPARC V8), 360 
write ancillary state register (WRasr) 
instructions, 358 
write ancillary state register instructions, See WRasr 
instruction 
write privileged register instruction, 361 
write-after-read memory hazard, 386 
write-after-write memory hazard, 385, 386 
WRPCR instruction, 68, 358 
WRPIC instruction, 68, 358, 449 
WRPR instruction 
accessing non-register-window PR state 
registers, 86 
accessing register-window PR state registers, 82 
and register-window PR state registers, 81 
effect on TNPC register, 88 
effect on TPC register, 87 
effect on TSTATE register, 89 
effect on TT register, 89 
writing the TICK register, 72 
writing to GL register, 97 
writing to PSTATE register, 90 
writing to TICK register, 72 
WRPSR instruction (SPARC V8), 360 
WRSOFTINT instruction, 68, 77, 358 
WRSOFTINT_CLR instruction, 68, 77, 79, 358, 457 
WRSOFTINT_SET instruction, 68, 77, 78, 358, 456 
WRSTICK_CMPR instruction, 68, 358 
WRTBR instruction (SPARC V8), 360 
WRTICK_CMP instruction, 68, 358 
WRWIM instruction (SPARC V8), 360 
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WRY instruction, 67, 69, 358 

WSTATE (window state) register 
description, 84 
and fill/spill exceptions, 451 
normal field, 451 
other field, 451 
overview, 81 
reading with RDPR instruction, 290 
spill exception, 177 
spill trap, 301 
writing with WRPR instruction, 361 


X 

XNOR instruction, 363 
XNORcc instruction, 363 
XOR instruction, 363 
XORcc instruction, 363 


Y 

Y register, 67, 69 
after multiplication completed, 270 
content after divide operation, 304, 354 
divide operation, 304, 354 
multiplication, 270 
unsigned multiply results, 311, 356 
WRY instruction, 359 

Y register (deprecated), 69 


Z 


zero virtual address, 397 
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